The Best 25 Datasets for Natural Language Processing


Where's the best place to look for free online datasets for NLP? We combed the web to create the ultimate cheat sheet, broken down into datasets for text, audio speech, and sentiment analysis. Sentiment140: a popular dataset, which uses 160,000 tweets with emoticons pre-removed. Twitter US Airline Sentiment: Twitter data on US airlines from February 2015, classified as positive, negative, and neutral tweets. Yelp Reviews: An open dataset released by Yelp, contains more than 5 million reviews.