Interrogating LLM design under a fair learning doctrine
Wei, Johnny Tian-Zheng, Wang, Maggie, Godbole, Ameya, Choi, Jonathan H., Jia, Robin
–arXiv.org Artificial Intelligence
The current discourse on large language models (LLMs) and copyright largely takes a "behavioral" perspective, focusing on model outputs and evaluating whether they are substantially similar to training data. However, substantial similarity is difficult to define algorithmically and a narrow focus on model outputs is insufficient to address all copyright risks. In this interdisciplinary work, we take a complementary "structural" perspective and shift our focus to how LLMs are trained. We operationalize a notion of "fair learning" by measuring whether any training decision substantially affected the model's memorization. As a case study, we deconstruct Pythia, an open-source LLM, and demonstrate the use of causal and correlational analyses to make factual determinations about Pythia's training decisions. By proposing a legal standard for fair learning and connecting memorization analyses to this standard, we identify how judges may advance the goals of copyright law through adjudication. Finally, we discuss how a fair learning standard might evolve to enhance its clarity by becoming more rule-like and incorporating external technical guidelines.
arXiv.org Artificial Intelligence
Feb-22-2025
- Country:
- South America
- Colombia > Meta Department
- Villavicencio (0.04)
- Brazil > Rio de Janeiro
- Rio de Janeiro (0.04)
- Colombia > Meta Department
- North America
- United States
- Minnesota (0.04)
- Texas (0.04)
- Wisconsin (0.04)
- Ohio (0.04)
- Colorado (0.04)
- District of Columbia > Washington (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Illinois > Cook County
- Chicago (0.04)
- New Jersey > Mercer County
- Princeton (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- California
- New York > New York County
- New York City (0.05)
- Mexico > Mexico City
- Mexico City (0.04)
- Canada > British Columbia
- United States
- Europe
- Asia
- Singapore (0.04)
- Indonesia > Bali (0.04)
- China > Hong Kong (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- South America
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Technology: