geometry question
GeomVerse: A Systematic Evaluation of Large Models for Geometric Reasoning
Kazemi, Mehran, Alvari, Hamidreza, Anand, Ankit, Wu, Jialin, Chen, Xi, Soricut, Radu
Large language models have shown impressive results for multi-hop mathematical reasoning when the input question is only textual. Many mathematical reasoning problems, however, contain both text and image. With the ever-increasing adoption of vision language models (VLMs), understanding their reasoning abilities for such problems is crucial. In this paper, we evaluate the reasoning capabilities of VLMs along various axes through the lens of geometry problems. We procedurally create a synthetic dataset of geometry questions with controllable difficulty levels along multiple axes, thus enabling a systematic evaluation. The empirical results obtained using our benchmark for state-of-the-art VLMs indicate that these models are not as capable in subjects like geometry (and, by generalization, other topics requiring similar reasoning) as suggested by previous benchmarks. This is made especially clear by the construction of our benchmark at various depth levels, since solving higher-depth problems requires long chains of reasoning rather than additional memorized knowledge. We release the dataset for further research in this area.
AI system solves SAT geometry questions as well as an eleven year old
Scientists have revealed an artificial intelligence (AI) system that can solve SAT geometry questions as well as the average American 11th-grade student. Called GeoS, it uses a combination of computer vision to interpret diagrams, natural language processing to read and understand text and a geometric solver to achieve 49 percent accuracy on official SAT test questions. If these results were extrapolated to the entire Math SAT test, the computer achieved an SAT score of 500 (out of 800), the average test score for 2015, the team behind it say. The system uses a combination of computer vision, natural language processing and a geometric solver to achieve 49 percent accuracy on official SAT test questions. GeoS is the first end-to-end system that solves SAT plane geometry problems.
This AI computer can beat students at SAT geometry questions
In 2014, the average SAT test taker correctly answered answered 49 percent of the test's math questions. Today, a new software program is now close to doing the same. In a paper published Monday, researchers at the Allen Institute for Artificial Intelligence (AI2) and the University of Washington revealed that their artificial intelligence (AI) system, known as GeoSolver, or GeoS for short, is able to answer "unseen and unaltered" geometry problems on par with humans. According to a report released by College Board, the average SAT math score in 2014 was 513. Though GeoS has only been tested on geometry questions, if the system's accuracy was extrapolated, GeoS would have scored a 500. Using a combination of computer vision and natural language processing, GeoS can interpret diagrams and process text that it then feeds into a geometric solver that analyzes the input and selects the best multiple choice answer.
Diagram Understanding in Geometry Questions
Seo, Min Joon (University of Washington) | Hajishirzi, Hannaneh (University of Washington) | Farhadi, Ali (University of Washington) | Etzioni, Oren (Allen Institute for AI)
Automatically solving geometry questions is a long-standing AI problem. A geometry question typically includes a textual description accompanied by a diagram. The first step in solving geometry questions is diagram understanding, which consists of identifying visual elements in the diagram, their locations, their geometric properties, and aligning them to corresponding textual descriptions. In this paper, we present a method for diagram understanding that identifies visual elements in a diagram while maximizing agreement between textual and visual data. We show that the method's objective function is submodular; thus we are able to introduce an efficient method for diagram understanding that is close to optimal. To empirically evaluate our method, we compile a new dataset of geometry questions (textual descriptions and diagrams) and compare with baselines that utilize standard vision techniques. Our experimental evaluation shows an F1 boost of more than 17% in identifying visual elements and 25% in aligning visual elements with their textual descriptions.