Lost in the Logic: An Evaluation of Large Language Models' Reasoning Capabilities on LSAT Logic Games
–arXiv.org Artificial Intelligence
In this thesis, I evaluate the performance of Large Language Models (LLMs) on the Law School Admissions Test (LSAT), specifically the Logic Games section of the test. I focus on this section because it presents a complex logical reasoning task and thus is a valuable source of data for evaluating how modern, increasingly capable LLMs can handle hard logical reasoning tasks. I construct a dataset of LSAT logic games and their associated metadata, and extensively evaluate LLMs' performance in a Chain-of-Thought prompting setting. Given the weak performance in this setting, I explore other prompting frameworks on a smaller subset of the dataset, adapting ideas from Reflexion to this task. This results in a substantially improved accuracy of 70 percent for GPT-4 and 46 percent for GPT-3.5 on this data subset, highlighting the capacity of LLMs to revise their logical errors, despite initially weak performance. Finally, I analyze the types of logic games that models perform better or worse on, as well as the types of logical errors I observe from human annotation, providing detailed insights on the logical reasoning capabilities of LLMs.
arXiv.org Artificial Intelligence
Sep-23-2024
- Country:
- Africa > Zambia (0.04)
- Asia > Middle East
- Yemen (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America
- Dominican Republic (0.04)
- United States > Louisiana
- Orleans Parish > New Orleans (0.04)
- South America > Venezuela (0.04)
- Genre:
- Instructional Material (0.68)
- Research Report (1.00)
- Industry:
- Education
- Curriculum > Subject-Specific Education (0.54)
- Educational Setting > Higher Education (0.54)
- Education
- Technology: