BLT: Can Large Language Models Handle Basic Legal Text?
Blair-Stanek, Andrew, Holzenberger, Nils, Van Durme, Benjamin
–arXiv.org Artificial Intelligence
We find that the best publicly available LLMs like GPT-4 and PaLM 2 currently perform poorly at basic text handling required of lawyers or paralegals, such as looking up the text at a line of a witness deposition or at a subsection of a contract. We introduce a benchmark to quantify this poor performance, which casts into doubt LLMs' current reliability as-is for legal practice. Finetuning for these tasks brings an older LLM to near-perfect performance on our test set and also raises performance on a related legal task. This stark result highlights the need for more domain expertise in LLM training.
arXiv.org Artificial Intelligence
Nov-16-2023
- Country:
- North America > United States (0.93)
- Genre:
- Research Report (0.40)
- Industry:
- Law > Litigation (0.94)
- Technology: