Difference between revisions of "Section 2: Methods (from DOI:10.14218/ERHM.2020.00023)"

From Wikibase.slis.ua.edu
Jump to navigation Jump to search
 
 
Line 1: Line 1:
 
{{infobox_publicationsection}}
 
{{infobox_publicationsection}}
<h3><u>Section 2: Methods</u></h3><p>The data of daily new cases and new deaths of COVID-19 in the USA were extracted from the 1-point-3-acres.com and the Johns Hopkins COVID-19 data repository on April 9, 2020, respectively, for modelling. We later obtained additional data from these sites to evaluate our models' accuracies using Pearson's correlation coefficients. We used a semiparametric model, including prediction of the daily new-case or new-death value based on a given Google Trends search-interest using Pearson's correlation (the parametric component), as well as assigning such a predicted value to the corresponding date of the given Google Trends search interest. Owing to no finite dimensionality of Google Trends search-interest versus time, the second component thus is non-parametric.</p><p>Data from the World Health Organization (WHO) Situation Reports appeared significantly inconsistent, and thus were not used. According to the 1-point-3-acres.com website, their data were extracted from various media and government websites, have been manually verified, and have been used by various parties, including Johns Hopkins COVID-19 data repository, WHO, and many others. Due to the use of publicly available, de-identified data and lack of protected health information, the study is exempted from requiring an Institutional Review Board approval (Category 4).</p><p>We used the Google Trends function to extract the data of search-interest with the search period of March 1 to April 7, 2020 and COVID-19-related search terms. Based on the COVID-19 symptoms, common terms for COVID-19 and common diseases in the USA, we chose the search terms of "COVID-19," "COVID," "coronavirus," "SARS-CoV2," "pneumonia," "high temperature," "cough," "COVID heart," "COVID pneumonia," and "COVID diabetes." Google Trends search-interest represented search interest relative to the highest search-interest for a given time and region. A value of 100 is the peak popularity for the term, while a score of 0 means there were not enough data for this term.</p><p>We then examined the lag correlations of the terms' search interests with COVID-19 daily new cases and deaths as described before, whereas the lag time was defined as the difference between a data point's original corresponding time and the shifted one in the lag correlation study. The lag times of our interest were up to 20 days for daily new cases and 23 days for daily death, respectively. The terms with the top-3 correlation coefficients were used to build respective generalized linear models. Based on these models, we used the existing search interests to predict future COVID-19 daily new cases and new deaths in the USA, which would be compared with the prospectively collected data for assessing prediction accuracies.</p><p>All statistical analyses were carried out using Stata (version 15). The models' accuracies were assessed using Pearson's r. All p values were two-sided. Only a p<0.05 was considered statistically significant.</p>
+
<b>From publication:</b> "Trends and Prediction in Daily New Cases and Deaths of COVID-19 in the United States: An Internet Search-Interest Based Model" published as Explor Res Hypothesis Med; 2020 Apr 18 ; 5 (2) 1-6. DOI: https://doi.org/10.14218/ERHM.2020.00023 <br><br><h3><u>Section 2: Methods</u></h3><p>The data of daily new cases and new deaths of COVID-19 in the USA were extracted from the 1-point-3-acres.com and the Johns Hopkins COVID-19 data repository on April 9, 2020, respectively, for modelling. We later obtained additional data from these sites to evaluate our models' accuracies using Pearson's correlation coefficients. We used a semiparametric model, including prediction of the daily new-case or new-death value based on a given Google Trends search-interest using Pearson's correlation (the parametric component), as well as assigning such a predicted value to the corresponding date of the given Google Trends search interest. Owing to no finite dimensionality of Google Trends search-interest versus time, the second component thus is non-parametric.</p><p>Data from the World Health Organization (WHO) Situation Reports appeared significantly inconsistent, and thus were not used. According to the 1-point-3-acres.com website, their data were extracted from various media and government websites, have been manually verified, and have been used by various parties, including Johns Hopkins COVID-19 data repository, WHO, and many others. Due to the use of publicly available, de-identified data and lack of protected health information, the study is exempted from requiring an Institutional Review Board approval (Category 4).</p><p>We used the Google Trends function to extract the data of search-interest with the search period of March 1 to April 7, 2020 and COVID-19-related search terms. Based on the COVID-19 symptoms, common terms for COVID-19 and common diseases in the USA, we chose the search terms of "COVID-19," "COVID," "coronavirus," "SARS-CoV2," "pneumonia," "high temperature," "cough," "COVID heart," "COVID pneumonia," and "COVID diabetes." Google Trends search-interest represented search interest relative to the highest search-interest for a given time and region. A value of 100 is the peak popularity for the term, while a score of 0 means there were not enough data for this term.</p><p>We then examined the lag correlations of the terms' search interests with COVID-19 daily new cases and deaths as described before, whereas the lag time was defined as the difference between a data point's original corresponding time and the shifted one in the lag correlation study. The lag times of our interest were up to 20 days for daily new cases and 23 days for daily death, respectively. The terms with the top-3 correlation coefficients were used to build respective generalized linear models. Based on these models, we used the existing search interests to predict future COVID-19 daily new cases and new deaths in the USA, which would be compared with the prospectively collected data for assessing prediction accuracies.</p><p>All statistical analyses were carried out using Stata (version 15). The models' accuracies were assessed using Pearson's r. All p values were two-sided. Only a p<0.05 was considered statistically significant.</p>

Latest revision as of 15:06, 23 June 2020


Navigation
ArticleTrends and Prediction in Daily New Cases and Deaths of COVID-19 in the United States: An Internet Search-Interest Based Model
Sections in this Publication
SectionSection 1: Introduction (from DOI:10.14218/ERHM.2020.00023)
SectionSection 2: Methods (from DOI:10.14218/ERHM.2020.00023)
SectionSection 3: Results (from DOI:10.14218/ERHM.2020.00023)
SectionSection 4: Discussion (from DOI:10.14218/ERHM.2020.00023)
SectionSection 5: Future directions (from DOI:10.14218/ERHM.2020.00023)
SectionSection 6: Conclusions (from DOI:10.14218/ERHM.2020.00023)
SectionReferences (from DOI:10.14218/ERHM.2020.00023)
Named Entities in this Section
EntityCardiac Death (disease - MeSH descriptor)
EntityCOVID-19 (disease - MeSH supplementary concept)
Entity2019 novel coronavirus (species)
EntityPneumonia (disease - MeSH descriptor)
EntityCough (disease - MeSH descriptor)
EntityDiabetes Mellitus (disease - MeSH descriptor)
DatasetPubtator Central BioC-JSON formatted article files

From publication: "Trends and Prediction in Daily New Cases and Deaths of COVID-19 in the United States: An Internet Search-Interest Based Model" published as Explor Res Hypothesis Med; 2020 Apr 18 ; 5 (2) 1-6. DOI: https://doi.org/10.14218/ERHM.2020.00023

Section 2: Methods

The data of daily new cases and new deaths of COVID-19 in the USA were extracted from the 1-point-3-acres.com and the Johns Hopkins COVID-19 data repository on April 9, 2020, respectively, for modelling. We later obtained additional data from these sites to evaluate our models' accuracies using Pearson's correlation coefficients. We used a semiparametric model, including prediction of the daily new-case or new-death value based on a given Google Trends search-interest using Pearson's correlation (the parametric component), as well as assigning such a predicted value to the corresponding date of the given Google Trends search interest. Owing to no finite dimensionality of Google Trends search-interest versus time, the second component thus is non-parametric.

Data from the World Health Organization (WHO) Situation Reports appeared significantly inconsistent, and thus were not used. According to the 1-point-3-acres.com website, their data were extracted from various media and government websites, have been manually verified, and have been used by various parties, including Johns Hopkins COVID-19 data repository, WHO, and many others. Due to the use of publicly available, de-identified data and lack of protected health information, the study is exempted from requiring an Institutional Review Board approval (Category 4).

We used the Google Trends function to extract the data of search-interest with the search period of March 1 to April 7, 2020 and COVID-19-related search terms. Based on the COVID-19 symptoms, common terms for COVID-19 and common diseases in the USA, we chose the search terms of "COVID-19," "COVID," "coronavirus," "SARS-CoV2," "pneumonia," "high temperature," "cough," "COVID heart," "COVID pneumonia," and "COVID diabetes." Google Trends search-interest represented search interest relative to the highest search-interest for a given time and region. A value of 100 is the peak popularity for the term, while a score of 0 means there were not enough data for this term.

We then examined the lag correlations of the terms' search interests with COVID-19 daily new cases and deaths as described before, whereas the lag time was defined as the difference between a data point's original corresponding time and the shifted one in the lag correlation study. The lag times of our interest were up to 20 days for daily new cases and 23 days for daily death, respectively. The terms with the top-3 correlation coefficients were used to build respective generalized linear models. Based on these models, we used the existing search interests to predict future COVID-19 daily new cases and new deaths in the USA, which would be compared with the prospectively collected data for assessing prediction accuracies.

All statistical analyses were carried out using Stata (version 15). The models' accuracies were assessed using Pearson's r. All p values were two-sided. Only a p<0.05 was considered statistically significant.