Goto

Collaborating Authors

 sharpness







Guiding Diffusion Models for Versatile Face Restoration via Partial Guidance - Supplementary Material - Anonymous Author(s) Affiliation Address email

Neural Information Processing Systems

In this supplementary material, we provide additional discussions and results. B, we provide more results on various tasks, i.e., blind face restoration, old photo restoration, During the inference process, there involves hyperparameters belonging to three categories. Parameters for optional quality enhancement ( e.g., the range for multiple gradient steps to take place Table 1: Default hyperparameter settings in our experiments.T ask Sampling Partial Guidance Optional s As shown in Fig.1, when all the other inference settings are the same, we Input faces are corrupted by real-world degradations. Our method produces high-quality faces with faithful details. ( Zoom in for best view) 3 B.2 More Results on Old Photo Restoration This work focuses on restoring images corrupted by various forms of degradations. This could potentially lead to deceptive information, such as incorrect identity recognition.


5927edd18c5dd83aa8936a4610c72029-Supplemental-Conference.pdf

Neural Information Processing Systems

In this section, we examine our theoretical results with controlled experiments via synthetic data. We do not have a complete explanation for such spikes. At first glance, overfitting could happen when the number of linear measurements is less than the size of the groundtruth matrix. Moreover, when the measurements satisfy RIP, Li et al. Soltanolkotabi [ 45 ] show that GD exactly recovers the ground truth. To our best knowledge, most existing generalization analysis for flat regularization are for two-layer models, e.g., Li et al.



A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs

Kalra, Dayal Singh, Gagnon-Audet, Jean-Christophe, Gromov, Andrey, Mediratta, Ishita, Niu, Kelvin, Miller, Alexander H, Shvartsman, Michael

arXiv.org Machine Learning

Understanding the curvature evolution of the loss landscape is fundamental to analyzing the training dynamics of neural networks. The most commonly studied measure, Hessian sharpness ($λ_{\max}^H$) -- the largest eigenvalue of the loss Hessian -- determines local training stability and interacts with the learning rate throughout training. Despite its significance in analyzing training dynamics, direct measurement of Hessian sharpness remains prohibitive for Large Language Models (LLMs) due to high computational cost. We analyze $\textit{critical sharpness}$ ($λ_c$), a computationally efficient measure requiring fewer than $10$ forward passes given the update direction $Δ\mathbfθ$. Critically, this measure captures well-documented Hessian sharpness phenomena, including progressive sharpening and Edge of Stability. Using this measure, we provide the first demonstration of these sharpness phenomena at scale, up to $7$B parameters, spanning both pre-training and mid-training of OLMo-2 models. We further introduce $\textit{relative critical sharpness}$ ($λ_c^{1\to 2}$), which quantifies the curvature of one loss landscape while optimizing another, to analyze the transition from pre-training to fine-tuning and guide data mixing strategies. Critical sharpness provides practitioners with a practical tool for diagnosing curvature dynamics and informing data composition choices at scale. More broadly, our work shows that scalable curvature measures can provide actionable insights for large-scale training.