Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset Peter Henderson
–Neural Information Processing Systems
Emerging ethical approaches have attempted to filter pretraining material, but such approaches have been ad hoc and failed to take context into account. We offer an approach to filtering grounded in law, which has directly addressed the tradeoffs in filtering material.
Neural Information Processing Systems
Aug-18-2025, 08:08:56 GMT
- Country:
- Africa > Nigeria (0.04)
- North America
- United States
- Virginia (0.04)
- California (0.04)
- Wisconsin (0.04)
- New Jersey (0.04)
- Louisiana (0.04)
- Iowa (0.04)
- Washington > King County
- Seattle (0.04)
- Canada
- British Columbia (0.04)
- Ontario (0.04)
- United States
- Europe
- Germany (0.28)
- United Kingdom (0.04)
- Netherlands (0.04)
- France (0.04)
- Croatia (0.04)
- Belgium (0.04)
- Italy > Tuscany
- Florence (0.04)
- Asia
- Genre:
- Research Report
- New Finding (0.67)
- Experimental Study (0.67)
- Research Report
- Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Education (1.00)
- Law
- Statutes (1.00)
- Litigation (1.00)
- International Law (1.00)
- Government & the Courts (1.00)
- Criminal Law (1.00)
- Civil Rights & Constitutional Law (1.00)
- Government
- Technology:
- Information Technology
- Security & Privacy (1.00)
- Information Management (1.00)
- Communications (1.00)
- Data Science > Data Mining (0.67)
- Artificial Intelligence
- Natural Language (1.00)
- Machine Learning (1.00)
- Representation & Reasoning (0.92)
- Information Technology