AITopics | margin maximization

The implicit bias of neural networks has been extensively studied in recent years. Lyu and Li (2019) showed that in homogeneous networks trained with the exponential or the logistic loss, gradient flow converges to a KKT point of the max margin problem in parameter space. However, that leaves open the question of whether this point will generally be an actual optimum of the max margin problem. In this paper, we study this question in detail, for several neural network architectures involving linear and ReLU activations. Perhaps surprisingly, we show that in many cases, the KKT point is not even a local optimum of the max margin problem. On the flip side, we identify multiple settings where a local or global optimum can be guaranteed.

linear and relu network, margin maximization, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)

Add feedback

On Margin Maximization in Linear and ReLU Networks Gal V ardi TTI-Chicago and Hebrew University

Neural Information Processing SystemsAug-19-2025, 18:17:36 GMT

The implicit bias of neural networks has been extensively studied in recent years.

artificial intelligence, converge, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Cook County > Chicago (0.40)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

f1298750ed09618717f9c10ea8d1d3b0-AuthorFeedback.pdf

Neural Information Processing SystemsAug-17-2025, 05:48:27 GMT

We thank the reviewers for the detailed and insightful feedback. The reviewers noted that the paper "target[s] a timely "missing ... a bound on the target accuracy of the final classifier (in analogy to Theorem 3.2 which studies a Clarification on why this is not provided or difficult to provide ... would be useful." Instead, we focus on removing the spurious features. "[theory not surprising because loss] would favor good features due to their correlation with the model", "unclear We respectfully and strongly disagree. In Figure 10, the two losses achieve equivalent empirical performance.

classifier, experiment, spurious feature, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On Margin Maximization in Linear and ReLU Networks

Neural Information Processing SystemsJan-19-2025, 06:25:27 GMT

The implicit bias of neural networks has been extensively studied in recent years. Lyu and Li (2019) showed that in homogeneous networks trained with the exponential or the logistic loss, gradient flow converges to a KKT point of the max margin problem in parameter space. However, that leaves open the question of whether this point will generally be an actual optimum of the max margin problem. In this paper, we study this question in detail, for several neural network architectures involving linear and ReLU activations. Perhaps surprisingly, we show that in many cases, the KKT point is not even a local optimum of the max margin problem.

linear and relu network, margin maximization, max margin problem, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex

Kumar, Tanishq, Bordelon, Blake, Pehlevan, Cengiz, Murthy, Venkatesh N., Gershman, Samuel J.

arXiv.org Artificial IntelligenceNov-29-2024

Does learning of task-relevant representations stop when behavior stops changing? Motivated by recent theoretical advances in machine learning and the intuitive observation that human experts continue to learn from practice even after mastery, we hypothesize that task-specific representation learning can continue, even when behavior plateaus. In a novel reanalysis of recently published neural data, we find evidence for such learning in posterior piriform cortex of mice following continued training on a task, long after behavior saturates at near-ceiling performance ("overtraining"). This learning is marked by an increase in decoding accuracy from piriform neural populations and improved performance on held-out generalization tests. We demonstrate that class representations in cortex continue to separate during overtraining, so that examples that were incorrectly classified at the beginning of overtraining can abruptly be correctly classified later on, despite no changes in behavior during that time. We hypothesize this hidden yet rich learning takes the form of approximate margin maximization; we validate this and other predictions in the neural data, as well as build and interpret a simple synthetic model that recapitulates these phenomena. We conclude by showing how this model of late-time feature learning implies an explanation for the empirical puzzle of overtraining reversal in animal learning, where task-specific representations are more robust to particular task changes because the learned features can be reused.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Artificial Intelligence

2411.03541

Country: North America > United States > New York (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks

Tsilivis, Nikolaos, Vardi, Gal, Kempe, Julia

arXiv.org Machine LearningOct-29-2024

We study the implicit bias of the general family of steepest descent algorithms, which includes gradient descent, sign descent and coordinate descent, in deep homogeneous neural networks. We prove that an algorithm-dependent geometric margin starts increasing once the networks reach perfect training accuracy and characterize the late-stage bias of the algorithms. In particular, we define a generalized notion of stationarity for optimization problems and show that the algorithms progressively reduce a (generalized) Bregman divergence, which quantifies proximity to such stationary points of a margin-maximization problem. We then experimentally zoom into the trajectories of neural networks optimized with various steepest descent algorithms, highlighting connections to the implicit bias of Adam.

algorithm, descent, neural network, (14 more...)

arXiv.org Machine Learning

2410.22069

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(16 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

9461cce28ebe3e76fb4b931c35a169b0-Reviews.html

Neural Information Processing SystemsMar-13-2024, 18:38:33 GMT

In this paper the authors provide an algorithm for directly minimzing 0-1 loss and margin maximization. Most existing machine learning techniques have relied on minimizing a convex upper bound on the 0-1 loss in classification problems. In contrast, in this paper the authors propose a simple greedy algorithm for directly minimizing the 0-1 loss via a combination of weak learners. This is followed by a few steps of direct maximization of margin. The proposed algorithm is then evaluated on a few small low dimensional datasets.

adaboost, algorithm, weak learner, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling

Wang, Mingze, Min, Zeping, Wu, Lei

arXiv.org Artificial IntelligenceJan-28-2024

In this work, we investigate the margin-maximization bias exhibited by gradient-based algorithms in classifying linearly separable data. We present an in-depth analysis of the specific properties of the velocity field associated with (normalized) gradients, focusing on their role in margin maximization. Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient Descent (PRGD) and show that PRGD can maximize the margin at an {\em exponential rate}. This stands in stark contrast to all existing algorithms, which maximize the margin at a slow {\em polynomial rate}. Specifically, we identify mild conditions on data distribution under which existing algorithms such as gradient descent (GD) and normalized gradient descent (NGD) {\em provably fail} in maximizing the margin efficiently. To validate our theoretical findings, we present both synthetic and real-world experiments. Notably, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.

exp, neural information processing system, prgd, (13 more...)

arXiv.org Artificial Intelligence

2311.14387

Country: