population loss
A theory of learning data statistics in diffusion models, from easy to hard
Bardone, Lorenzo, Merger, Claudia, Goldt, Sebastian
While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms. Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure. Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Response to reviewers for the paper: " On Lazy Training in Differentiable Programming "
We thank the reviewers for their comments and suggestions. Hereafter, we list reviewers' (sometimes paraphrased) Each answer will translate into a clarification in the final version. Reviewer #2 and #3 felt that our message was lacking clarity. A.2). We will add more pointers to their statistical analysis, from the existing literature (e.g. L81-90 in the main paper, often α(m) = 1/ m in these works).
- North America > United States > Ohio (0.04)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- North America > Canada (0.04)
- (2 more...)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (11 more...)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > Canada > Alberta (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > France (0.04)