AITopics | balancedness

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Neural Information Processing SystemsDec-25-2025, 19:02:59 GMT

Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems

Sharpness-aware minimization (SAM) improves generalization of various deep learning tasks. Motivated by popular architectures such as LoRA, we explore the implicit regularization of SAM for scale-invariant problems involving two groups of variables. Instead of focusing on commonly used sharpness, this work introduces a concept termed, defined as the difference between the squared norm of two variables. This allows us to depict richer global behaviors of SAM. In particular, our theoretical and empirical findings reveal that i) SAM promotes balancedness; and ii) the regularization on balancedness is -- outliers have stronger impact. The latter coincides with empirical observations that SAM outperforms SGD in the presence of outliers. Leveraging the implicit regularization, we develop a resource-efficient SAM variant, balancedness-aware regularization (BAR), tailored for scale-invariant problems such as finetuning language models with LoRA. BAR saves 95% computational overhead of SAM, with enhanced test performance across various tasks on RoBERTa, GPT2, and OPT-1.3B.

artificial intelligence, machine learning, natural language, (9 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Lindsey, Kathryn, Menon, Govind

Regularization Implies balancedness in the deep linear network

arXiv.org Machine LearningNov-4-2025

We use geometric invariant theory (GIT) to study the deep linear network (DLN). The Kempf-Ness theorem is used to establish that the $L^2$ regularizer is minimized on the balanced manifold. This allows us to decompose the training dynamics into two distinct gradient flows: a regularizing flow on fibers and a learning flow on the balanced manifold. We show that the regularizing flow is exactly solvable using the moment map. This approach provides a common mathematical framework for balancedness in deep learning and linear systems theory. We use this framework to interpret balancedness in terms of model reduction and Bayesian principles.

artificial intelligence, equation, machine learning, (13 more...)

2511.01137

Country:

Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Neural Information Processing SystemsOct-10-2025, 02:00:07 GMT

4eb2c0adafbe71269f3a772c130f9e53-Paper-Conference.pdf

balancedness, proc, regularization, (16 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Neural Information Processing SystemsMay-27-2025, 00:54:13 GMT

Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems

artificial intelligence, deep learning, machine learning, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

arXiv.org Machine LearningOct-18-2024

Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems

Li, Bingcong, Zhang, Liang, He, Niao

Sharpness-aware minimization (SAM) improves generalization of various deep learning tasks. Motivated by popular architectures such as LoRA, we explore the implicit regularization of SAM for scale-invariant problems involving two groups of variables. Instead of focusing on commonly used sharpness, this work introduces a concept termed balancedness, defined as the difference between the squared norm of two variables. This allows us to depict richer global behaviors of SAM. In particular, our theoretical and empirical findings reveal that i) SAM promotes balancedness; and ii) the regularization on balancedness is data-responsive -- outliers have stronger impact. The latter coincides with empirical observations that SAM outperforms SGD in the presence of outliers. Leveraging the implicit regularization, we develop a resource-efficient SAM variant, balancedness-aware regularization (BAR), tailored for scale-invariant problems such as finetuning language models with LoRA. BAR saves 95% computational overhead of SAM, with enhanced test performance across various tasks on RoBERTa, GPT2, and OPT-1.3B.

artificial intelligence, machine learning, natural language, (17 more...)

2410.14802

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Jacot, Arthur, Súkeník, Peter, Wang, Zihan, Mondelli, Marco

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

arXiv.org Machine LearningOct-7-2024

Among the many possible interpolators that a deep neural network (DNN) can find, Papyan et al. (2020) showed a strong bias of gradient-based training towards representations with a highly symmetric structure in the penultimate layer, which was dubbed neural collapse (NC). In particular, the feature vectors of the training data in the penultimate layer collapse to a single vector per class (NC1); these vectors form orthogonal or simplex equiangular tight frames (NC2), and they are aligned with the last layer's row weight vectors (NC3). The question of why and how neural collapse emerges has been considered by a popular line of research, see e.g. Lu & Steinerberger (2022); E & Wojtowytsch (2022) and the discussion in Section 2. Many of these works focus on a simplified mathematical framework: the unconstrained features model (UFM) (Mixon et al., 2020; Han et al., 2022; Zhou et al., 2022a), corresponding to the joint optimization over the last layer's weights and the penultimate layer's feature representations, which are treated as free variables. To account for the existence of the training data and of all the layers before the penultimate (i.e., the backbone of the network), some form of regularization on the free features is usually added. A number of papers has proved the optimality of NC in this model (Lu & Steinerberger, 2022; E & Wojtowytsch, 2022), its emergence with gradient-based methods (Mixon et al., 2020; Han et al., 2022) and a benign loss landscape (Zhou et al., 2022a; Zhu et al., 2021). However, the major drawback of the UFM lies in its data-agnostic nature: it only acknowledges the presence of training data and backbone through a simple form of regularization (e.g., Frobenius norm or sphere constraint), which is far from being equivalent to end-to-end training.

arxiv preprint arxiv, neural collapse, neural network, (12 more...)

2410.04887

Country:

North America > United States (0.14)
Europe > Austria (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Mukherjee, Chandra Sekhar, Zhang, Jiapeng

A multi-core periphery perspective: Ranking via relative centrality

arXiv.org Machine LearningJun-6-2024

Community and core-periphery are two widely studied graph structures, with their coexistence observed in real-world graphs (Rombach, Porter, Fowler \& Mucha [SIAM J. App. Math. 2014, SIAM Review 2017]). However, the nature of this coexistence is not well understood and has been pointed out as an open problem (Yanchenko \& Sengupta [Statistics Surveys, 2023]). Especially, the impact of inferring the core-periphery structure of a graph on understanding its community structure is not well utilized. In this direction, we introduce a novel quantification for graphs with ground truth communities, where each community has a densely connected part (the core), and the rest is more sparse (the periphery), with inter-community edges more frequent between the peripheries. Built on this structure, we propose a new algorithmic concept that we call relative centrality to detect the cores. We observe that core-detection algorithms based on popular centrality measures such as PageRank and degree centrality can show some bias in their outcome by selecting very few vertices from some cores. We show that relative centrality solves this bias issue and provide theoretical and simulation support, as well as experiments on real-world graphs. Core detection is known to have important applications with respect to core-periphery structures. In our model, we show a new application: relative-centrality-based algorithms can select a subset of the vertices such that it contains sufficient vertices from all communities, and points in this subset are better separable into their respective communities. We apply the methods to 11 biological datasets, with our methods resulting in a more balanced selection of vertices from all communities such that clustering algorithms have better performance on this set.

algorithm, centrality, vertex, (16 more...)

2406.04487

Country:

North America > United States > California (0.14)
Europe > Netherlands > South Holland > Leiden (0.04)

Genre: Research Report (0.63)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Papazov, Hristo, Pesme, Scott, Flammarion, Nicolas

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

arXiv.org Machine LearningMar-8-2024

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent. We leverage a continuous-time approach in the analysis of momentum gradient descent with step size $\gamma$ and momentum parameter $\beta$ that allows us to identify an intrinsic quantity $\lambda = \frac{ \gamma }{ (1 - \beta)^2 }$ which uniquely defines the optimisation path and provides a simple acceleration rule. When training a $2$-layer diagonal linear network in an overparametrised regression setting, we characterise the recovered solution through an implicit regularisation problem. We then prove that small values of $\lambda$ help to recover sparse solutions. Finally, we give similar but weaker results for stochastic momentum gradient descent. We provide numerical experiments which support our claims.

balancedness, leveraging continuous time, mgf, (14 more...)

2403.05293

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Celine, Karen Frilya, Dzulfikar, Muhammad Ayaz, Koswara, Ivan Adrian

Egalitarian Price of Fairness for Indivisible Goods

arXiv.org Artificial IntelligenceFeb-25-2024

In the context of fair division, the concept of price of fairness has been introduced to quantify the loss of welfare when we have to satisfy some fairness condition. In other words, it is the price we have to pay to guarantee fairness. Various settings of fair division have been considered previously; we extend to the setting of indivisible goods by using egalitarian welfare as the welfare measure, instead of the commonly used utilitarian welfare. We provide lower and upper bounds for various fairness and efficiency conditions such as envy-freeness up to one good (EF1) and maximum Nash welfare (MNW).

allocation, fairness, welfare, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-99-7019-3_3

2402.16145

Country:

Europe > Monaco (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.71)