Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset Peter Henderson
–Neural Information Processing Systems
Emerging ethical approaches have attempted to filter pretraining material, but such approaches have been ad hoc and failed to take context into account. We offer an approach to filtering grounded in law, which has directly addressed the tradeoffs in filtering material.
Neural Information Processing Systems
Aug-18-2025, 08:08:52 GMT
- Country:
- Africa > Nigeria (0.04)
- Asia
- Europe
- Croatia (0.04)
- France (0.04)
- Germany (0.28)
- Italy > Tuscany
- Florence (0.04)
- Netherlands (0.04)
- United Kingdom (0.04)
- North America
- Canada
- British Columbia (0.04)
- Ontario (0.04)
- United States
- California (0.04)
- Iowa (0.04)
- New Jersey (0.04)
- Virginia (0.04)
- Washington > King County
- Seattle (0.04)
- Canada
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Government > Regional Government
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (1.00)
- Law
- Civil Rights & Constitutional Law (1.00)
- Criminal Law (1.00)
- Government & the Courts (0.93)
- Litigation (1.00)
- Statutes (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Issues > Social & Ethical Issues (0.67)
- Machine Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning (0.67)
- Communications (1.00)
- Data Science > Data Mining (0.68)
- Security & Privacy (1.00)
- Artificial Intelligence
- Information Technology