Accurate estimation of influenza epidemics using Google search data via ARGO

Yang, Shihao, Santillana, Mauricio, Kou, S. C.

arXiv.org Machine Learning 

Accurate real-time tracking of influenza outbreaks helps public health officials make timely and meaningful decisions that could save lives. We propose an influenza tracking model, ARGO (AutoRegression with GOogle search data), that uses publicly available online search data. In addition to having a rigorous statistical foundation, ARGO outperforms all previously available Google-searchbased tracking models, including the latest version of Google Flu Trends, even though it uses only low-quality search data as input from publicly available Google Trends and Google Correlate websites. ARGO not only incorporates the seasonality in influenza epidemics but also captures changes in peoples online search behavior over time. ARGO is also flexible, self-correcting, robust, and scalable, making it a potentially powerful tool that can be used for real-time tracking of other social events at multiple temporal and spatial resolutions. There are some minor differences between this preprint and the published paper. Big data sets are constantly generated nowadays as the activities of millions of users are collected from internet-based services. Numerous studies have suggested great potential of these big data sets to detect/manage epidemic outbreaks (influenza [1, 2, 3, 4, 5, 6], Ebola [7], dengue [8]), predict changes in stock prices [9, 10] and housing prices [11], etc. In 2009, Google Flu Trends (GFT), a digital disease detection system that uses the volume of selected Google search terms to estimate current influenza-like illnesses (ILI) activity, was identified by many as a good example of how big data would transform traditional statistical predictive analysis [12].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found