Quantum Criticism: A Tagged News Corpus Analysed for Sentiment and Named Entities
Badgujar, Ashwini, Chen, Sheng, Wang, Andrew, Yu, Kai, Intrevado, Paul, Brizan, David Guy
Several custom web scrapers were created for retrieving news articles from various online news organizations. All web scrapers were run every two hours to retrieve articles from the following five news sites: the Atlantic, the British Broadcasting Corporation (BBC) News, Fox News, the New York Times and Slate Magazine. Web scrapers continue to run every two hours in perpetuity, scraping additional news articles. Collectively, the web scrapers used each news organization's RSS feed as input, storing the scraped output into a custom database. Article URLs were used for disambiguation; where two scraped articles shared a URL, the most recently retrieved article replaced previous versions of articles. As of November 2019, we collected a total of 105,000 news articles from five media organizations. Figure 2 depicts the number of cumulative articles scraped for each news organization over time. Even though articles from Fox News were regularly scraped four months later than other news sources, the number of articles scraped rose quickly, and now constitutes the news organization with the most scraped articles. Given the news scrapers run at regularly scheduled two-hour intervals for all news organization, this suggests that Fox News updates its RSS feed with new articles far more often than others, and the Atlantic updates its RSS feed far less frequently than others.
Jun-5-2020
- Country:
- Africa > South Africa (0.04)
- Asia
- India (0.04)
- Middle East > Saudi Arabia
- Arabian Gulf (0.04)
- Europe
- France (0.04)
- Ireland (0.04)
- United Kingdom (0.04)
- Indian Ocean > Arabian Gulf (0.04)
- North America > United States
- California
- Los Angeles County > Los Angeles (0.04)
- San Francisco County > San Francisco (0.15)
- New York (0.05)
- California
- Oceania
- Australia (0.04)
- New Zealand (0.04)
- Genre:
- Research Report (0.64)
- Industry:
- Government > Regional Government
- Media > News (1.00)
- Technology: