BLT: Can Large Language Models Handle Basic Legal Text?

Blair-Stanek, Andrew, Holzenberger, Nils, Van Durme, Benjamin

Nov-16-2023–arXiv.org Artificial Intelligence

We find that the best publicly available LLMs like GPT-4 and PaLM 2 currently perform poorly at basic text handling required of lawyers or paralegals, such as looking up the text at a line of a witness deposition or at a subsection of a contract. We introduce a benchmark to quantify this poor performance, which casts into doubt LLMs' current reliability as-is for legal practice. Finetuning for these tasks brings an older LLM to near-perfect performance on our test set and also raises performance on a related legal task. This stark result highlights the need for more domain expertise in LLM training.

gpt-4, synthetic section, transcript, (14 more...)

arXiv.org Artificial Intelligence

Nov-16-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- North America > United States
  - Massachusetts (0.04)
  - Maryland (0.04)

Genre:
- Research Report (0.40)

Industry:
- Law
  - Litigation (0.94)
  - Statutes (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.45)