AITopics | initial phase

The Crucial Role of Normalization in Sharpness-Aware Minimization Yan Dai

Neural Information Processing SystemsFeb-17-2026, 08:34:41 GMT

Sharpness-A ware Minimization (SAM) is a recently proposed gradient-based optimizer (Foret et al., ICLR 2021) that greatly improves the prediction performance of deep neural networks.

artificial intelligence, machine learning, usam, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

d616a353c711f11c722e3f28d2d9e956-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 08:34:38 GMT

artificial intelligence, machine learning, usam, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

cba76ef96c4cd625631ab4d33285b045-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 22:48:32 GMT

convergent phase, neuron, temporal structure, (13 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

61162d94822d468ee6e92803340f2040-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 09:06:57 GMT

In data-parallel optimization of machine learning models, workers collaborate to improve their estimates of the model: more accurate gradients allow them to use larger learning rates and optimize faster. We consider the setting in which allworkerssample fromthesamedataset, andcommunicate overasparsegraph (decentralized).

artificial intelligence, machine learning, topology, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Chatbots can sway political opinions but are 'substantially' inaccurate, study finds

The GuardianDec-4-2025, 19:00:28 GMT

The study said tweaking a model after its initial phase of development was an importand factor in making it more persuasive. The study said tweaking a model after its initial phase of development was an importand factor in making it more persuasive. Chatbots can sway political opinions but are'substantially' inaccurate, study finds'Information-dense' AI responses are most persuasive but these tend to be less accurate, says security report Chatbots can sway people's political opinions but the most persuasive artificial intelligence models deliver "substantial" amounts of inaccurate information in the process, according to the UK government's AI security body. Researchers said the study was the largest and most systematic investigation of AI persuasiveness to date, involving nearly 80,000 British participants holding conversations with 19 different AI models. The AI Security Institute carried out the study amid fears that chatbots can be deployed for illegal activities including fraud and grooming.

artificial intelligence, chatbot, natural language, (11 more...)

The Guardian

Country:

Europe > Ukraine (0.07)
Oceania > Australia (0.05)
North America > United States > Massachusetts (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)

Industry:

Leisure & Entertainment > Sports (0.98)
Government > Regional Government > Europe Government > United Kingdom Government (0.36)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

d616a353c711f11c722e3f28d2d9e956-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 08:39:50 GMT

phase, proof, usam, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

d616a353c711f11c722e3f28d2d9e956-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 08:39:46 GMT

phase, proof, usam, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

cba76ef96c4cd625631ab4d33285b045-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-18-2025, 23:27:11 GMT

artificial intelligence, machine learning, temporal structure, (15 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Criteria and Bias of Parameterized Linear Regression under Edge of Stability Regime

Zhang, Peiyuan, Karbasi, Amin

arXiv.org Machine LearningDec-10-2024

Classical optimization theory requires a small step-size for gradient-based methods to converge. Nevertheless, recent findings challenge the traditional idea by empirically demonstrating Gradient Descent (GD) converges even when the step-size $\eta$ exceeds the threshold of $2/L$, where $L$ is the global smooth constant. This is usually known as the Edge of Stability (EoS) phenomenon. A widely held belief suggests that an objective function with subquadratic growth plays an important role in incurring EoS. In this paper, we provide a more comprehensive answer by considering the task of finding linear interpolator $\beta \in R^{d}$ for regression with loss function $l(\cdot)$, where $\beta$ admits parameterization as $\beta = w^2_{+} - w^2_{-}$. Contrary to the previous work that suggests a subquadratic $l$ is necessary for EoS, our novel finding reveals that EoS occurs even when $l$ is quadratic under proper conditions. This argument is made rigorous by both empirical and theoretical evidence, demonstrating the GD trajectory converges to a linear interpolator in a non-asymptotic way. Moreover, the model under quadratic $l$, also known as a depth-$2$ diagonal linear network, remains largely unexplored under the EoS regime. Our analysis then sheds some new light on the implicit bias of diagonal linear networks when a larger step-size is employed, enriching the understanding of EoS on more practical models.

convergence, iteration, regime, (13 more...)

arXiv.org Machine Learning

2412.08025

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Add feedback

Dynamics of Concept Learning and Compositional Generalization

Yang, Yongyi, Park, Core Francisco, Lubana, Ekdeep Singh, Okawa, Maya, Hu, Wei, Tanaka, Hidenori

arXiv.org Machine LearningOct-10-2024

Prior work has shown that text-conditioned diffusion models can learn to identify and manipulate primitive concepts underlying a compositional data-generating process, enabling generalization to entirely novel, out-of-distribution compositions. Beyond performance evaluations, these studies develop a rich empirical phenomenology of learning dynamics, showing that models generalize sequentially, respecting the compositional hierarchy of the data-generating process. Moreover, concept-centric structures within the data significantly influence a model's speed of learning the ability to manipulate a concept. In this paper, we aim to better characterize these empirical results from a theoretical standpoint. Specifically, we propose an abstraction of prior work's compositional generalization problem by introducing a structured identity mapping (SIM) task, where a model is trained to learn the identity mapping on a Gaussian mixture with structurally organized centroids. We mathematically analyze the learning dynamics of neural networks trained on this SIM task and show that, despite its simplicity, SIM's learning dynamics capture and help explain key empirical observations on compositional generalization with diffusion models identified in prior work. Our theory also offers several new insights -- e.g., we find a novel mechanism for non-monotonic learning dynamics of test loss in early phases of training. We validate our new predictions by training a text-conditioned diffusion model, bridging our simplified framework and complex generative models. Overall, this work establishes the SIM task as a meaningful theoretical abstraction of concept learning dynamics in modern generative models.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

arXiv.org Machine Learning

2410.08309

Country: