Tracking Epidemics with Natural Language Processing and Crowdsourcing
Munro, Robert (Stanford University) | Gunasekara, Lucky (EpidemicIQ) | Nevins, Stephanie ( EpidemicIQ ) | Polepeddi, Lalith ( EpidemicIQ ) | Rosen, Evan ( Stanford )
The first indication of a new outbreak is often in unstructured data (natural language) and reported openly in traditional or social media as a new `flu-like' or `malaria-like' illness weeks or months before the new pathogen is eventually isolated. We present a system for tracking these early signals globally, using natural language processing and crowdsourcing. By comparison, search-log-based approaches, while innovative and inexpensive, are often a trailing signal that follow open reports in plain language. Concentrating on discovering outbreak-related reports in big open data, we show how crowdsourced workers can create near-real-time training data for adaptive active-learning models, addressing the lack of broad coverage training data for tracking epidemics. This is well-suited to an outbreak information-flow context, where sudden bursts of information about new diseases/locations need to be manually processed quickly at short notice.
Mar-25-2012
- Country:
- Africa > Uganda (0.04)
- Asia > India (0.04)
- Europe > Germany (0.04)
- North America > United States
- Genre:
- Research Report (0.46)
- Industry:
- Technology: