Rescaled Influence Functions: Accurate Data Attribution in High Dimension
–Neural Information Processing Systems
How does the training data affect a model's behavior? This is the question we seek to answer with data attribution. The leading practical approaches to data attribution are based on influence functions (IF). IFs utilize a first-order Taylor approximation to efficiently predict the effect of removing a set of samples from the training set without retraining the model, and are used in a wide variety of machine learning applications. However, especially in the high-dimensional regime (# params Ω(# samples)), they are often imprecise and tend to underestimate the effect of sample removals, even for simple models such as logistic regression. We present rescaled influence functions (RIF), a tool for data attribution which can be used as a dropin replacement for influence functions, with little computational overhead but significant improvement in accuracy. We compare IF and RIF on a range of realworld datasets, showing that RIFs offer significantly better predictions in practice, and present a theoretical analysis explaining this improvement. Finally, we present a simple class of data poisoning attacks that would fool IF-based detections but would be detected by RIF.
Neural Information Processing Systems
Jun-22-2026, 23:07:57 GMT
- Country:
- North America > United States (1.00)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Research Report
- Industry:
- Transportation (0.68)
- Technology: