AnnoCaseLaw: A Richly-Annotated Dataset For Benchmarking Explainable Legal Judgment Prediction

Sesodia, Magnus, Petrova, Alina, Armour, John, Lukasiewicz, Thomas, Camburu, Oana-Maria, Dokania, Puneet K., Torr, Philip, de Witt, Christian Schroeder

Feb-28-2025–arXiv.org Artificial Intelligence

Legal systems worldwide continue to struggle with overwhelming caseloads, limited judicial resources, and growing complexities in legal proceedings. Artificial intelligence (AI) offers a promising solution, with Legal Judgment Prediction (LJP) -- the practice of predicting a court's decision from the case facts -- emerging as a key research area. However, existing datasets often formulate the task of LJP unrealistically, not reflecting its true difficulty. They also lack high-quality annotation essential for legal reasoning and explainability. To address these shortcomings, we introduce AnnoCaseLaw, a first-of-its-kind dataset of 471 meticulously annotated U.S. Appeals Court negligence cases. Each case is enriched with comprehensive, expert-labeled annotations that highlight key components of judicial decision making, along with relevant legal concepts. Our dataset lays the groundwork for more human-aligned, explainable LJP models. We define three legally relevant tasks: (1) judgment prediction; (2) concept identification; and (3) automated case annotation, and establish a performance baseline using industry-leading large language models (LLMs). Our results demonstrate that LJP remains a formidable task, with application of legal precedent proving particularly difficult. Code and data are available at https://github.com/anonymouspolar1/annocaselaw.

association, computational linguistic, dataset, (15 more...)

arXiv.org Artificial Intelligence

Feb-28-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > UAE (0.14)
  - Thailand (0.14)
- Europe
  - Belgium (0.14)
  - Denmark (0.14)
  - Italy (0.14)
  - Spain (0.14)
  - United Kingdom (0.28)
- North America > United States (1.00)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Government > Regional Government
  - North America Government > United States Government (0.46)
- Law > Litigation (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)
  - Natural Language > Large Language Model (1.00)