Artificial-intelligence system surfs web to improve its performance

#artificialintelligence

Of the vast wealth of information unlocked by the Internet, most is plain text. The data necessary to answer myriad questions--about, say, the correlations between the industrial use of certain chemicals and incidents of disease, or between patterns of news coverage and voter-poll results--may all be online. But extracting it from plain text and organizing it for quantitative analysis may be prohibitively time consuming. Information extraction--or automatically classifying data items stored as plain text--is thus a major topic of artificial-intelligence research. Last week, at the Association for Computational Linguistics' Conference on Empirical Methods on Natural Language Processing, researchers from MIT's Computer Science and Artificial Intelligence Laboratory won a best-paper award for a new approach to information extraction that turns conventional machine learning on its head.


A machine-learning system that trains itself by surfing the web

#artificialintelligence

Most successful information extraction systems operate with access to a large collection of documents. In this work, we explore the task of acquiring and incorporating external evidence to improve extraction accuracy in domains where the amount of training data is scarce. This process entails issuing search queries, extraction from new sources and reconciliation of extracted values, which are repeated until sufficient evidence is collected. We approach the problem using a reinforcement learning framework where our model learns to select optimal actions based on contextual information. We employ a deep Qnetwork, trained to optimize a reward function that reflects extraction accuracy while penalizing extra effort. Our experiments on two databases – of shooting incidents, and food adulteration cases – demonstrate that our system significantly outperforms traditional extractors and a competitive meta-classifier baseline.


A machine-learning system that trains itself by surfing the web

#artificialintelligence

MIT researchers have designed a new machine-learning system that can learn by itself to extract text information for statistical analysis when available data is scarce. This new "information extraction" system turns machine learning on its head. It works like humans do. When we run out of data in a study (say, differentiating between fake and real news), we simply search the Internet for more data, and then we piece the new data together to make sense out of it all. That differs from most machine-learning systems, which are fed as many training examples as possible to increase the chances that the system will be able to handle difficult problems by looking for patterns compared to training data.


A machine-learning system that trains itself by surfing the web

#artificialintelligence

Most successful information extraction systems operate with access to a large collection of documents. In this work, we explore the task of acquiring and incorporating external evidence to improve extraction accuracy in domains where the amount of training data is scarce. This process entails issuing search queries, extraction from new sources and reconciliation of extracted values, which are repeated until sufficient evidence is collected. We approach the problem using a reinforcement learning framework where our model learns to select optimal actions based on contextual information. We employ a deep Qnetwork, trained to optimize a reward function that reflects extraction accuracy while penalizing extra effort. Our experiments on two databases – of shooting incidents, and food adulteration cases – demonstrate that our system significantly outperforms traditional extractors and a competitive meta-classifier baseline.


Artificial Intelligence system improves performance by surfing on internet

#artificialintelligence

Researchers from the US have developed an artificial intelligence (AI) system that surfs the internet, extracts information from the available plain text and organizes it for quantitative analysis in very less time. Recently at the Association for Computational Linguistics' Conference on Empirical Methods on Natural Language Processing, researchers from the Massachusetts Institute of Technology (MIT) Computer Science and Artificial Intelligence Laboratory won a best-paper award for a new approach to information extraction that turns conventional machine learning on its head. Most machine-learning systems work by combing through training examples and looking for patterns that correspond to classifications provided by human annotators. In their new paper, the MIT researchers trained their system on scanty data -- because in the scenario they're investigating, that's usually all that's available. But then they find the limited information an easy problem to solve.