Goto

Collaborating Authors

 population loss


A theory of learning data statistics in diffusion models, from easy to hard

Bardone, Lorenzo, Merger, Claudia, Goldt, Sebastian

arXiv.org Machine Learning

While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms. Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure. Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity.



Response to reviewers for the paper: " On Lazy Training in Differentiable Programming "

Neural Information Processing Systems

We thank the reviewers for their comments and suggestions. Hereafter, we list reviewers' (sometimes paraphrased) Each answer will translate into a clarification in the final version. Reviewer #2 and #3 felt that our message was lacking clarity. A.2). We will add more pointers to their statistical analysis, from the existing literature (e.g. L81-90 in the main paper, often α(m) = 1/ m in these works).





BenignOverfittinginTwo-layer ConvolutionalNeuralNetworks

Neural Information Processing Systems

Modern neural networks often have great expressive power and can be trained to overfit the training data, while still achieving a good test performance.