training example
A Bayesian Information-Theoretic Approach to Data Attribution
Tailor, Dharmesh, Felicioni, Nicolò, Ciosek, Kamil
Training Data Attribution (TDA) seeks to trace model predictions back to influential training examples, enhancing interpretability and safety. We formulate TDA as a Bayesian information-theoretic problem: subsets are scored by the information loss they induce - the entropy increase at a query when removed. This criterion credits examples for resolving predictive uncertainty rather than label noise. To scale to modern networks, we approximate information loss using a Gaussian Process surrogate built from tangent features. We show this aligns with classical influence scores for single-example attribution while promoting diversity for subsets. For even larger-scale retrieval, we relax to an information-gain objective and add a variance correction for scalable attribution in vector databases. Experiments show competitive performance on counterfactual sensitivity, ground-truth retrieval and coreset selection, showing that our method scales to modern architectures while bridging principled measures with practice.
On the role of memorization in learned priors for geophysical inverse problems
Siahkoohi, Ali, Sabeddu, Davide
Learned priors based on deep generative models offer data-driven regularization for seismic inversion, but training them requires a dataset of representative subsurface models -- a resource that is inherently scarce in geoscience applications. Since the training objective of most generative models can be cast as maximum likelihood on a finite dataset, any such model risks converging to the empirical distribution -- effectively memorizing the training examples rather than learning the underlying geological distribution. We show that the posterior under such a memorized prior reduces to a reweighted empirical distribution -- i.e., a likelihood-weighted lookup among the stored training examples. For diffusion models specifically, memorization yields a Gaussian mixture prior in closed form, and linearizing the forward operator around each training example gives a Gaussian mixture posterior whose components have widths and shifts governed by the local Jacobian. We validate these predictions on a stylized inverse problem and demonstrate the consequences of memorization through diffusion posterior sampling for full waveform inversion.
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > New Jersey (0.04)
- North America > Canada (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > Canada (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East > Jordan (0.04)
1 Datasheet for QM1B
As recommended by the NeurIPS dataset and benchmark track, we documented QM1B and intended uses through the Datasheets for Datasets framework [1]. The goal of dataset datasheets as outlined by [1] is to provide a standardized process for documentating datasets. The authors of [1] present a list of carefully selected questions which dataset authors should answer. We hope our answers to these questions will facilitate better communication between us (the dataset creators) and future users of QM1B. For what purpose was the dataset created? Prior gaussian-based Density Functional Theory (DFT) datasets contained fewer than 20 million training examples.
- North America > United States > Connecticut > New Haven County > Wallingford (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom (0.04)
- Asia > Bhutan (0.05)
- North America > United States > California (0.04)
- Africa > Sudan (0.04)
- Africa > Middle East > Egypt (0.04)
- Banking & Finance > Economy (1.00)
- Education > Educational Setting (0.70)