A Simple and Efficient Baseline for Data Attribution on Images

Singla, Vasu, Sandoval-Segura, Pedro, Goldblum, Micah, Geiping, Jonas, Goldstein, Tom

Nov-3-2023–arXiv.org Artificial Intelligence

Data attribution methods play a crucial role in understanding machine learning models, providing insight into which training data points are most responsible for model outputs during deployment. However, current state-of-the-art approaches require a large ensemble of as many as 300,000 models to accurately attribute model predictions. These approaches therefore come at a high computational cost, are memory intensive, and are hard to scale to large models or datasets. In this work, we focus on a minimalist baseline, utilizing the feature space of a backbone pretrained via self-supervised learning to perform data attribution. Our method is model-agnostic and scales easily to large datasets. We show results on CIFAR-10 and ImageNet, achieving strong performance that rivals or outperforms state-of-the-art approaches at a fraction of the compute or memory cost. Contrary to prior work, our results reinforce the intuition that a model's prediction on one image is most impacted by visually similar training samples. Our approach serves as a simple and efficient baseline for data attribution on images.

attribution method, datamodel, training sample, (14 more...)

arXiv.org Artificial Intelligence

Nov-3-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Maryland (0.04)
- Europe
  - France (0.04)
  - Germany > Baden-Württemberg
    - Tübingen Region > Tübingen (0.04)

Genre:
- Research Report > New Finding (0.86)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found