Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset Peter Henderson
–Neural Information Processing Systems
Emerging ethical approaches have attempted to filter pretraining material, but such approaches have been ad hoc and failed to take context into account. We offer an approach to filtering grounded in law, which has directly addressed the tradeoffs in filtering material.
Neural Information Processing Systems
Nov-15-2025, 22:56:23 GMT
- Country:
- Africa > Nigeria (0.04)
- Asia
- Europe
- Croatia (0.04)
- France (0.04)
- Germany (0.28)
- Italy > Tuscany
- Florence (0.04)
- Netherlands (0.04)
- United Kingdom (0.04)
- North America
- Canada
- British Columbia (0.04)
- Ontario (0.04)
- United States
- California (0.04)
- Iowa (0.04)
- New Jersey (0.04)
- Virginia (0.04)
- Washington > King County
- Seattle (0.04)
- Canada
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Government
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (1.00)
- Law
- Civil Rights & Constitutional Law (1.00)
- Criminal Law (1.00)
- Government & the Courts (0.93)
- International Law (0.67)
- Litigation (1.00)
- Statutes (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning (0.67)
- Communications (1.00)
- Data Science > Data Mining (0.68)
- Information Management (0.68)
- Security & Privacy (1.00)
- Artificial Intelligence
- Information Technology