Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods

Neural Information Processing Systems 

Training data attribution (TDA) is concerned with understanding model behavior in terms of the training data. This paper draws attention to the common setting where one has access only to the final trained model, and not the training algorithm or intermediate information from training.