Gaps or Hallucinations? Gazing into Machine-Generated Legal Analysis for Fine-grained Text Evaluations

Hou, Abe Bohan, Jurayj, William, Holzenberger, Nils, Blair-Stanek, Andrew, Van Durme, Benjamin

Sep-15-2024–arXiv.org Artificial Intelligence

Large Language Models (LLMs) show promise as a writing aid for professionals performing legal analyses. However, LLMs can often hallucinate in this setting, in ways difficult to recognize by non-professionals and existing text evaluation metrics. In this work, we pose the question: when can machine-generated legal analysis be evaluated as acceptable? We introduce the neutral notion of gaps, as opposed to hallucinations in a strict erroneous sense, to refer to the difference between human-written and machine-generated legal analysis. Gaps do not always equate to invalid generation. Working with legal experts, we consider the CLERC generation task proposed in Hou et al. (2024b), leading to a taxonomy, a fine-grained detector for predicting gap categories, and an annotated dataset for automatic evaluation. Our best detector achieves 67% F1 score and 80% precision on the test set. Employing this detector as an automated metric on legal analysis generated by SOTA LLMs, we find around 80% contain hallucinations of different kinds.

hallucination, legal analysis, mismatch, (14 more...)

arXiv.org Artificial Intelligence

Sep-15-2024

arXiv.org PDF

Add feedback

Country:
- Asia
  - British Indian Ocean Territory > Diego Garcia (0.04)
  - Middle East > Saudi Arabia
    - Asir Province > Abha (0.04)
- North America
  - Puerto Rico > Mayagüez
    - Mayagüez (0.04)
  - United States
    - New York > New York County
      - New York City (0.04)
    - California (0.04)
    - Pennsylvania (0.04)
    - Iowa (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - North Carolina (0.04)
    - Maryland (0.04)
    - New Jersey > Essex County
      - Newark (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)

Genre:
- Research Report (0.50)

Industry:
- Government > Regional Government
  - North America Government > United States Government (1.00)
- Law
  - Government & the Courts (1.00)
  - Litigation (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.48)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found