The Hurricane Sandy Twitter Corpus
Wang, Haoyu (Carnegie Mellon University) | Hovy, Eduard (Carnegie Mellon University) | Dredze, Mark (Johns Hopkins University)
The growing use of social media has made it a critical component of disaster response and recovery efforts. Both in terms of preparedness and response, public health officials and first responders have turned to automated tools to assist with organizing and visualizing large streams of social media. In turn, this has spurred new research into algorithms for information extraction, event detection and organization, and information visualization. One challenge of these efforts has been the lack of a common corpus for disaster response on which researchers can compare and contrast their work. This paper describes the Hurricane Sandy Twitter Corpus: 6.5 million geotagged Twitter posts from the geographic area and time period of the 2012 Hurricane Sandy.
Mar-1-2015
- Country:
- North America
- Cuba (0.05)
- Jamaica (0.04)
- United States
- New York (0.05)
- North Carolina (0.05)
- New Jersey (0.05)
- Virginia (0.05)
- West Virginia (0.04)
- South Carolina (0.04)
- Maryland > Baltimore (0.04)
- Connecticut (0.04)
- Rhode Island (0.04)
- Ohio (0.04)
- Massachusetts (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.15)
- Europe > United Kingdom
- England (0.04)
- Asia
- Middle East > Jordan (0.05)
- Japan (0.04)
- China (0.04)
- North America
- Industry:
- Health & Medicine > Public Health (0.49)
- Technology: