Training Data Attribution via Approximate Unrolled Differentiation
Bae, Juhan, Lin, Wu, Lorraine, Jonathan, Grosse, Roger
–arXiv.org Artificial Intelligence
Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set. Methods based on implicit differentiation, such as influence functions, can be made computationally efficient, but fail to account for underspecification, the implicit bias of the optimization algorithm, or multi-stage training pipelines. By contrast, methods based on unrolling address these issues but face scalability challenges.
arXiv.org Artificial Intelligence
May-21-2024
- Country:
- North America > Canada > Ontario > Toronto (0.14)
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Government (1.00)
- Health & Medicine > Therapeutic Area (0.93)
- Technology: