WikiTableQuestions: a Complex Real-World Question Understanding Dataset - The Stanford Natural Language Processing Group

@machinelearnbot

Natural language question understanding has been one of the most important challenges in artificial intelligence. Indeed, eminent AI benchmarks such as the Turing test require an AI system to understand natural language questions, with various topics and complexity, and then respond appropriately. During the past few years, we have witnessed rapid progress in question answering technology, with virtual assistants like Siri, Google Now, and Cortana answering daily life questions, and IBM Watson winning over humans in Jeopardy!. Many questions the systems encounter are simple lookup questions (e.g., "Where is Chichen Itza?" or "Who's the manager of Man Utd?"). The answers can be found by searching the surface forms.



The Best Public Datasets for Machine Learning

#artificialintelligence

First, a couple of pointers to keep in mind when searching for datasets. Kaggle: A data science site that contains a variety of externally contributed interesting datasets. You can find all kinds of niche datasets in its master list, from ramen ratings to basketball data to and even seattle pet licenses. Although the data sets are user-contributed, and thus have varying levels of cleanliness, the vast majority are clean. VisualData: Discover computer vision datasets by category, it allows searchable queries.


The Best 25 Datasets for Natural Language Processing Gengo AI

#artificialintelligence

Where's the best place to look for free online datasets for NLP? We combed the web to create the ultimate cheat sheet, broken down into datasets for text, audio speech, and sentiment analysis. Sentiment140: a popular dataset, which uses 160,000 tweets with emoticons pre-removed. Twitter US Airline Sentiment: Twitter data on US airlines from February 2015, classified as positive, negative, and neutral tweets. Yelp Reviews: An open dataset released by Yelp, contains more than 5 million reviews.