LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Guha, Neel, Nyarko, Julian, Ho, Daniel E., Ré, Christopher, Chilton, Adam, Narayana, Aditya, Chohlas-Wood, Alex, Peters, Austin, Waldon, Brandon, Rockmore, Daniel N., Zambrano, Diego, Talisman, Dmitry, Hoque, Enam, Surani, Faiz, Fagan, Frank, Sarfaty, Galit, Dickinson, Gregory M., Porat, Haggai, Hegland, Jason, Wu, Jessica, Nudell, Joe, Niklaus, Joel, Nay, John, Choi, Jonathan H., Tobia, Kevin, Hagan, Margaret, Ma, Megan, Livermore, Michael, Rasumov-Rahe, Nikon, Holzenberger, Nils, Kolt, Noam, Henderson, Peter, Rehaag, Sean, Goel, Sharad, Gao, Shang, Williams, Spencer, Gandhi, Sunny, Zur, Tom, Iyer, Varun, Li, Zehua
–arXiv.org Artificial Intelligence
The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning -- which distinguish between its many forms -- correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.
arXiv.org Artificial Intelligence
Aug-20-2023
- Country:
- Africa > Central African Republic (0.04)
- Europe
- Denmark (0.04)
- Germany (0.04)
- Ireland (0.13)
- Portugal (0.04)
- Spain > Galicia
- Madrid (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Hampshire > Southampton (0.04)
- Oxfordshire > Oxford (0.04)
- North America
- Canada
- Alberta
- Census Division No. 11 > Sturgeon County (0.04)
- Census Division No. 13 > Westlock County (0.04)
- Newfoundland and Labrador > Labrador (0.04)
- Ontario
- National Capital Region > Ottawa (0.13)
- Toronto (0.13)
- Alberta
- Costa Rica (0.04)
- Mexico (0.04)
- Puerto Rico (0.04)
- United States
- Montana (0.04)
- Wyoming (0.04)
- Michigan > Wayne County
- Detroit (0.04)
- Pennsylvania (0.04)
- Colorado (0.04)
- Virginia (0.04)
- Idaho (0.04)
- Nebraska (0.04)
- Louisiana (0.04)
- New Jersey (0.04)
- Oklahoma (0.04)
- New Mexico (0.04)
- Missouri > Jackson County
- Kansas City (0.04)
- Rhode Island (0.04)
- New Hampshire (0.04)
- Tennessee (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Utah > Salt Lake County (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- Kansas (0.04)
- Connecticut (0.04)
- South Dakota (0.04)
- Arizona (0.04)
- New York (0.04)
- Texas (0.14)
- Indiana (0.04)
- Oregon > Marion County
- Salem (0.04)
- Maine (0.04)
- Alaska (0.04)
- South Carolina (0.04)
- West Virginia (0.04)
- Canada
- Pacific Ocean > North Pacific Ocean
- San Francisco Bay > Golden Gate (0.04)
- South America > Bolivia (0.04)
- Genre:
- Financial News (1.00)
- Overview (1.00)
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Workflow (0.92)
- Industry:
- Aerospace & Defense > Aircraft (0.67)
- Transportation
- Banking & Finance
- Financial Services (0.93)
- Insurance (0.92)
- Mergers & Acquisitions (0.67)
- Trading (1.00)
- Education
- Assessment & Standards (0.67)
- Curriculum > Subject-Specific Education (0.67)
- Educational Setting > Higher Education (0.67)
- Government
- Immigration & Customs (1.00)
- Regional Government > North America Government
- United States Government (1.00)
- Tax (1.00)
- Health & Medicine
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area (1.00)
- Law
- Business Law (1.00)
- Statutes (1.00)
- Family Law (0.67)
- Litigation (1.00)
- Civil Rights & Constitutional Law (1.00)
- Government & the Courts (1.00)
- Criminal Law (1.00)
- Intellectual Property & Technology Law (1.00)
- Taxation Law (0.92)
- Information Technology
- Security & Privacy (1.00)
- Services (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Technology: