LLMs Provide Unstable Answers to Legal Questions

Blair-Stanek, Andrew, Van Durme, Benjamin

Jan-28-2025–arXiv.org Artificial Intelligence

An LLM is stable if it reaches the same conclusion when asked the identical question multiple times. We find leading LLMs like gpt-4o, claude-3.5, and gemini-1.5 are unstable when providing answers to hard legal questions, even when made as deterministic as possible by setting temperature to 0. We curate and release a novel dataset of 500 legal questions distilled from real cases, involving two parties, with facts, competing legal arguments, and the question of which party should prevail. When provided the exact same question, we observe that LLMs sometimes say one party should win, while other times saying the other party should win. This instability has implications for the increasing numbers of legal AI products, legal processes, and lawyers relying on these LLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Jan-28-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Maryland (0.14)

Genre:
- Research Report (1.00)

Industry:
- Government > Regional Government
  - North America Government > United States Government (0.68)
- Law
  - Government & the Courts (0.69)
  - Litigation (0.94)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.72)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found