Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset Peter Henderson
–Neural Information Processing Systems
Emerging ethical approaches have attempted to filter pretraining material, but such approaches have been ad hoc and failed to take context into account. We offer an approach to filtering grounded in law, which has directly addressed the tradeoffs in filtering material.
Neural Information Processing Systems
Aug-18-2025, 08:08:52 GMT
- Country:
- Africa > Nigeria (0.04)
- North America
- United States
- Virginia (0.04)
- New Jersey (0.04)
- Iowa (0.04)
- California (0.04)
- Washington > King County
- Seattle (0.04)
- Canada
- British Columbia (0.04)
- Ontario (0.04)
- United States
- Europe
- Germany (0.28)
- United Kingdom (0.04)
- Netherlands (0.04)
- France (0.04)
- Croatia (0.04)
- Italy > Tuscany
- Florence (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Law
- Statutes (1.00)
- Litigation (1.00)
- Criminal Law (1.00)
- Civil Rights & Constitutional Law (1.00)
- Government & the Courts (0.93)
- International Law (0.67)
- Government
- Technology:
- Information Technology
- Security & Privacy (1.00)
- Communications (1.00)
- Data Science > Data Mining (0.68)
- Information Management (0.68)
- Artificial Intelligence
- Natural Language (1.00)
- Machine Learning (1.00)
- Representation & Reasoning (0.67)
- Information Technology