Rewriting History: A Recipe for Interventional Analyses to Study Data Effects on Model Behavior
Nadkarni, Rahul, Elazar, Yanai, Gonen, Hila, Smith, Noah A.
–arXiv.org Artificial Intelligence
We present an experimental recipe for studying the relationship between training data and language model (LM) behavior. We outline steps for intervening on data batches -- i.e., ``rewriting history'' -- and then retraining model checkpoints over that data to test hypotheses relating data to behavior. Our recipe breaks down such an intervention into stages that include selecting evaluation items from a benchmark that measures model behavior, matching relevant documents to those items, and modifying those documents before retraining and measuring the effects. We demonstrate the utility of our recipe through case studies on factual knowledge acquisition in LMs, using both cooccurrence statistics and information retrieval methods to identify documents that might contribute to knowledge learning. Our results supplement past observational analyses that link cooccurrence to model behavior, while demonstrating that extant methods for identifying relevant training documents do not fully explain an LM's ability to correctly answer knowledge questions. Overall, we outline a recipe that researchers can follow to test further hypotheses about how training data affects model behavior. Our code is made publicly available to promote future work.
arXiv.org Artificial Intelligence
Oct-17-2025
- Country:
- Asia
- China > Hong Kong (0.04)
- Japan > Kyūshū & Okinawa
- Kyūshū > Miyazaki Prefecture > Miyazaki (0.04)
- Middle East
- Jordan (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Russia (0.04)
- Singapore (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- France (0.05)
- Greece (0.04)
- Romania (0.04)
- Russia (0.04)
- Spain (0.04)
- United Kingdom > England (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada
- British Columbia (0.04)
- Ontario > Toronto (0.04)
- Cuba (0.04)
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Washington > King County
- Seattle (0.04)
- New Mexico > Bernalillo County
- Canada
- South America
- Guyana (0.04)
- Peru > Madre de Dios Department
- Tambopata Province > Puerto Maldonado (0.04)
- Uruguay > Maldonado
- Maldonado (0.04)
- Asia
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Education (0.46)
- Technology: