Training Data Attribution via Approximate Unrolling
–Neural Information Processing Systems
Many training data attribution (TDA) methods aim to estimate how a model's behavior would change if one or more data points were removed from the training set. Methods based on implicit differentiation, such as influence functions, can be made computationally efficient, but fail to account for underspecification, the implicit bias of the optimization algorithm, or multi-stage training pipelines. By contrast, methods based on unrolling address these issues but face scalability challenges.
Neural Information Processing Systems
Dec-26-2025, 09:49:48 GMT
- Technology: