Lost in the Logic: An Evaluation of Large Language Models' Reasoning Capabilities on LSAT Logic Games

Sep-23-2024–arXiv.org Artificial Intelligence

In this thesis, I evaluate the performance of Large Language Models (LLMs) on the Law School Admissions Test (LSAT), specifically the Logic Games section of the test. I focus on this section because it presents a complex logical reasoning task and thus is a valuable source of data for evaluating how modern, increasingly capable LLMs can handle hard logical reasoning tasks. I construct a dataset of LSAT logic games and their associated metadata, and extensively evaluate LLMs' performance in a Chain-of-Thought prompting setting. Given the weak performance in this setting, I explore other prompting frameworks on a smaller subset of the dataset, adapting ideas from Reflexion to this task. This results in a substantially improved accuracy of 70 percent for GPT-4 and 46 percent for GPT-3.5 on this data subset, highlighting the capacity of LLMs to revise their logical errors, despite initially weak performance. Finally, I analyze the types of logic games that models perform better or worse on, as well as the types of logical errors I observe from human annotation, providing detailed insights on the logical reasoning capabilities of LLMs.

accuracy, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

Sep-23-2024

arXiv.org PDF

Add feedback

Country:
- Africa > Zambia (0.04)
- Asia > Middle East
  - Yemen (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- North America
  - Dominican Republic (0.04)
  - United States > Louisiana
    - Orleans Parish > New Orleans (0.04)
- South America > Venezuela (0.04)

Genre:
- Instructional Material (0.68)
- Research Report (1.00)

Industry:
- Education
  - Curriculum > Subject-Specific Education (0.54)
  - Educational Setting > Higher Education (0.54)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.77)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found