In 2014, the average SAT test taker correctly answered answered 49 percent of the test's math questions. Today, a new software program is now close to doing the same. In a paper published Monday, researchers at the Allen Institute for Artificial Intelligence (AI2) and the University of Washington revealed that their artificial intelligence (AI) system, known as GeoSolver, or GeoS for short, is able to answer "unseen and unaltered" geometry problems on par with humans. According to a report released by College Board, the average SAT math score in 2014 was 513. Though GeoS has only been tested on geometry questions, if the system's accuracy was extrapolated, GeoS would have scored a 500.
Given the well-known limitations of the Turing Test, there is a need for objective tests to both focus attention on, and measure progress towards, the goals of AI. In this paper we argue that machine performance on standardized tests should be a key component of any new measure of AI, because attaining a high level of performance requires solving significant AI problems involving language understanding and world modeling - critical skills for any machine that lays claim to intelligence. In addition, standardized tests have all the basic requirements of a practical test: they are accessible, easily comprehensible, clearly measurable, and offer a graduated progression from simple tasks to those requiring deep understanding of the world. Here we propose this task as a challenge problem for the community, summarize our state-of-the-art results on math and science tests, and provide supporting datasets
Automatically solving geometry questions is a long-standing AI problem. A geometry question typically includes a textual description accompanied by a diagram. The first step in solving geometry questions is diagram understanding, which consists of identifying visual elements in the diagram, their locations, their geometric properties, and aligning them to corresponding textual descriptions. In this paper, we present a method for diagram understanding that identifies visual elements in a diagram while maximizing agreement between textual and visual data. We show that the method's objective function is submodular; thus we are able to introduce an efficient method for diagram understanding that is close to optimal. To empirically evaluate our method, we compile a new dataset of geometry questions (textual descriptions and diagrams) and compare with baselines that utilize standard vision techniques. Our experimental evaluation shows an F1 boost of more than 17% in identifying visual elements and 25% in aligning visual elements with their textual descriptions.
Artificial intelligence-related research has tremendous potential to become useful in practical, everyday applications and to dramatically increase productivity. The field has been developing rapidly in recent years and is expected to really start taking off in the near future. A few of the cool things happening on the cutting edge in AI are highlighted below. AI Crossword App Could Help Machines Understand Language Researchers have designed a web-based platform that uses artificial neural networks to answer standard crossword clues better than existing commercial products specifically designed for the task. The system, which is freely available online, could help machines understand language more effectively.
Clark, Peter, Etzioni, Oren, Khashabi, Daniel, Khot, Tushar, Mishra, Bhavana Dalvi, Richardson, Kyle, Sabharwal, Ashish, Schoenick, Carissa, Tafjord, Oyvind, Tandon, Niket, Bhakthavatsalam, Sumithra, Groeneveld, Dirk, Guerquin, Michal, Schmitz, Michael
AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy, but the rich variety of standardized exams has remained a landmark challenge. Even in 2016, the best AI system achieved merely 59.3% on an 8th Grade science exam challenge. This paper reports unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90% on the exam's non-diagram, multiple choice (NDMC) questions. In addition, our Aristo system, building upon the success of recent language models, exceeded 83% on the corresponding Grade 12 Science Exam NDMC questions. The results, on unseen test questions, are robust across different test years and different variations of this kind of test. They demonstrate that modern NLP methods can result in mastery on this task. While not a full solution to general question-answering (the questions are multiple choice, and the domain is restricted to 8th Grade science), it represents a significant milestone for the field.