Rescaled Influence Functions: Accurate Data Attribution in High Dimension

Jun-10-2025–arXiv.org Machine Learning

How does the training data affect a model's behavior? This is the question we seek to answer with data attribution. The leading practical approaches to data attribution are based on influence functions (IF). IFs utilize a first-order Taylor approximation to efficiently predict the effect of removing a set of samples from the training set without retraining the model, and are used in a wide variety of machine learning applications. However, especially in the high-dimensional regime (# params $\geq Ω($# samples$)$), they are often imprecise and tend to underestimate the effect of sample removals, even for simple models such as logistic regression. We present rescaled influence functions (RIF), a new tool for data attribution which can be used as a drop-in replacement for influence functions, with little computational overhead but significant improvement in accuracy. We compare IF and RIF on a range of real-world datasets, showing that RIFs offer significantly better predictions in practice, and present a theoretical analysis explaining this improvement. Finally, we present a simple class of data poisoning attacks that would fool IF-based detections but would be detected by RIF.

artificial intelligence, influence function, machine learning, (17 more...)

arXiv.org Machine Learning

Jun-10-2025

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America
  - United States
    - Maryland > Baltimore (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.14)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California > Santa Clara County
      - Mountain View (0.04)
  - Canada > British Columbia
    - Vancouver (0.04)
- Europe
  - Austria > Vienna (0.14)
  - France (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Transportation (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found