Section 4: Discussion (from DOI:10.14218/ERHM.2020.00023)
Section 4: Discussion
This population-based study shows that there were 636,282 new cases and,325 deaths of COVID-19 reported in the USA from March 1 to April 15, 2020. It also shows that the search-interest of COVID, COVID pneumonia, and COVID heart were highly correlated with COVID-19 daily new cases and new deaths, with a delay of 12 days and 19 days, respectively. However, the prediction accuracies of these models appeared low during a 7-day follow-up.
To our knowledge, this study provided, for the first time, evidence that search-interest pertinent to COVID-19 is highly correlated with the trends in COVID-19 daily new cases and new deaths in the USA. The approximately 7 days of difference in lag time between daily new cases and deaths suggest the possibility of a 7-day interval between COVID-19 diagnosis and death in some patients. Additional studies are warranted to investigate this hypothesis. The findings of our study enable us to model daily new cases and deaths in the USA during the early phase (March 1 to April 8) of the COVID-19 outbreak and may greatly help prevent and prepare for any upcoming pandemic and burdens of COVID-19 in the future.
The 12 days of lag time in the USA, as shown by us, was longer than the previously reported 9 days in China. Several factors may contribute to this difference but should be subject to additional studies. First, there was a significant delay in testing for COVID-19 in the USA, which might subsequently lead to longer lag time between the trends of search-interest and daily incidence. Second, the U.S. Centers for Disease Control and Prevention (CDC) recommended a priority-based testing strategy and allowed for not testing some subjects considered low-priority when the COVID-19 tests are short in supply. The criteria for testing COVID-19 in the USA, therefore, were different from those in China and Europe, where the WHO criteria were adopted. Thus, the patients, who met the WHO criteria, may not be tested and subsequently not included in the daily incidence in the USA; this could lead to underreporting of daily incidence. Third, the biological and socioeconomic differences between the USA and Chinese patients may also contribute to the difference. Finally, the prevalent COVID-19 subtypes in the USA may also be different from those in China and result in different lag times.
This study provides several lines of valuable evidence. First, COVID-19 daily new deaths in the USA are poorly understood, and are here described and studied using a semiparametric model. Second, we extensively examined nine COVID-19-related search terms, which are more than the two used in a previous study. Our data also suggest that pneumonia and heart problems were highly relevant to the daily new cases and deaths in the USA. This finding may be explained by the frequent pneumonia and cardiac injuries seen in COVID-19 patients. Third, the lag time in our study was longer than that previously reported in China (12 days vs. 9 days). However, the 12 and 19 days of lag time also afforded us the opportunity to assess a model's prediction accuracy for a longer period of future trends. Fourth, the comparison of predicted values and prospectively collected data will significantly reduce the recall and selection biases.
We will continue updating the models' accuracies as more data become available (see https://github.com/thezhanglab/COVID-US-google). Indeed, we found very high correlation in retrospective modelling but low accuracy in prediction, suggesting that the search-interest based model may be more helpful in predicting daily-incidence peak or early outbreak than post-peak or post-intervention trends. The unexpected low accuracy of model prediction was due to significant attenuation of trend plateau. It may be linked to the April 3 recommendation of wearing masks by the U.S. CDC, which was 5 days before our model's peak time and matched the COVID19's median incubation time of 5 days. Finally, to our knowledge, we are first to examine the correlations of search interest with the COVID-19 daily new cases and deaths in the USA and show greater correlations (Pearson's r > 0.97) than reported in the Chinese data.
This study is limited by the retrospective nature of the modeling part and may have some related biases. Moreover, due to the different testing strategies and criteria used in the USA and other countries, the comparison of our findings to those of other countries should be interpreted with caution. Finally, the data from Johns Hopkins' data repository was not independently validated or authenticated. However, our sensitivity study using the 1-point-3-acres' data confirms a similar correlation of search-interest with COVID-19 daily new cases and deaths in the USA.