strong solution
Unveiling m-Sharpness Through the Structure of Stochastic Gradient Noise
Sharpness-aware minimization (SAM) has emerged as a highly effective technique to improve model generalization, but its underlying principles are not fully understood. We investigate m-sharpness, where SAM performance improves monotonically as the micro-batch size for computing perturbations decreases, a phenomenon critical for distributed training yet lacking rigorous explanation. We leverage an extended Stochastic Differential Equation (SDE) framework and analyze stochastic gradient noise (SGN) to characterize the dynamics of SAM variants, including n-SAM and m-SAM. Our analysis reveals that stochastic perturbations induce an implicit variance-based sharpness regularization whose strength increases as m decreases. Motivated by this insight, we propose Reweighted SAM (RW-SAM), which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable.
Reflected diffusion models adapt to low-dimensional data
Holk, Asbjรธrn, Strauch, Claudia, Trottner, Lukas
While the mathematical foundations of score-based generative models are increasingly well understood for unconstrained Euclidean spaces, many practical applications involve data restricted to bounded domains. This paper provides a statistical analysis of reflected diffusion models on the hypercube $[0,1]^D$ for target distributions supported on $d$-dimensional linear subspaces. A primary challenge in this setting is the absence of Gaussian transition kernels, which play a central role in standard theory in $\mathbb{R}^D$. By employing an easily implementable infinite series expansion of the transition densities, we develop analytic tools to bound the score function and its approximation by sparse ReLU networks. For target densities with Sobolev smoothness $ฮฑ$, we establish a convergence rate in the $1$-Wasserstein distance of order $n^{-\frac{ฮฑ+1-ฮด}{2ฮฑ+d}}$ for arbitrarily small $ฮด> 0$, demonstrating that the generative algorithm fully adapts to the intrinsic dimension $d$. These results confirm that the presence of reflecting boundaries does not degrade the fundamental statistical efficiency of the diffusion paradigm, matching the almost optimal rates known for unconstrained settings.
Quantitative Propagation of Chaos for SGD in Wide Neural Networks S
Mean field approximation and propagation of chaos for mSGLD . . . . . . . . . . 4 S3 T echnical results 4 S4 Quantitative propagation of chaos 8 S4.1 Existence of strong solutions to the particle SDE . . . . . . . . . . . . . . . . . . If F = R, then we simply note C( E). S2.1 Presentation of the modified SGLD and its continuous counterpart The proof is postponed to Section S4.4 Consider now the mean-field SDE starting from a random variable W The proof is postponed to Section S4.4 Then, there exists L 0 such that the following hold. In what follows, we bound separately the two terms in the right-hand side.