AITopics | margin distribution

Spectrally-normalized margin bounds for neural networks

Neural Information Processing SystemsApr-23-2026, 17:48:28 GMT

This paper presents a margin-based multiclass generalization bound for neural networks that scales with their margin-normalized spectral complexity: their Lipschitz constant, meaning the product of the spectral norms of the weight matrices, times a certain correction factor. This bound is empirically investigated for a standard AlexNet network trained with SGD on the mnistand cifar10datasets, with both original and random labels; the bound, the Lipschitz constants, and the excess risks are all in direct correlation, suggesting both that SGD selects predictors whose complexity scales with the difficulty of the learning task, and secondly that the presented bound is sensitive to this complexity.

artificial intelligence, machine learning, margin distribution, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

A Refined Margin Distribution Analysis for Forest Representation Learning

Shen-Huan Lyu, Liang Yang, Zhi-Hua Zhou

Neural Information Processing SystemsFeb-14-2026, 13:54:35 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

1ea97de85eb634d580161c603422437f-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 17:45:39 GMT

dataset, densenet-121, resnet-18, (17 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

146f7dd4c91bc9d80cf4458ad6d6cd1b-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 13:54:30 GMT

generalization, prediction, probability, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > Denmark (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

1ea97de85eb634d580161c603422437f-Supplemental.pdf

Neural Information Processing SystemsOct-2-2025, 09:44:07 GMT

Supplementary material: Hold me tight! A Theoretical margin distribution of a linear classifier 2 B Examples of frequency "flipped" images 4 C Invariance and elasticity on MNIST data 4 D Connections to catastrophic forgetting 5 E Examples of filtered images 6 F Subspace sampling of the DCT 6 G Training parameters 7 H Cross-dataset performance 8 I Margin distribution for standard networks 9 J Adversarial training parameters 13 K Description of L2-PGD attack on frequency "flipped" data 14 L Spectral decomposition on frequency "flipped" data 15 M Margin distribution for adversarially trained networks 16 N Margin distribution on random subspaces 19 We demonstrate this effect in practice by repeating the experiment of Sec. MLP we use a simple logistic regression (see Table S1).Clearly, although the values along Figure S1 shows a few example images of the frequency "flipped" versions of the standard computer We further validate our observation of Section 3.2.2 that small margin do indeed corresponds to After this, we continue training the network with a linearly decaying learning rate (max. Figure S4: Filtered image examples. Table S2 shows the performance and training parameters of the different networks used in the paper.

artificial intelligence, machine learning, resnet-18, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

146f7dd4c91bc9d80cf4458ad6d6cd1b-Supplemental.pdf

Neural Information Processing SystemsOct-2-2025, 04:33:10 GMT

artificial intelligence, generalization, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)

Add feedback

146f7dd4c91bc9d80cf4458ad6d6cd1b-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 04:33:04 GMT

artificial intelligence, generalization, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)

Add feedback

Structure-Preserving Margin Distribution Learning for High-Order Tensor Data with Low-Rank Decomposition

Xu, Yang, Li, Junpeng, Hua, Changchun, Yang, Yana

arXiv.org Artificial IntelligenceSep-19-2025

Abstract--The Large Margin Distribution Machine (LMDM) is a recent advancement in classifier design that optimizes not just the minimum margin (as in SVM) but the entire margin distribution, thereby improving generalization. However, existing LMDM formulations are limited to vectorized inputs and struggle with high-dimensional tensor data due to the need for flattening, which destroys the data's inherent multi-mode structure and increases computational burden. In this paper, we propose a Structure-Preserving Margin Distribution Learning for High-Order T ensor Data with Low-Rank Decomposition (SPMD-LRT) that operates directly on tensor representations without vectorization. The SPMD-LRT preserves multi-dimensional spatial structure by incorporating first-order and second-order tensor statistics (margin mean and variance) into the objective, and it leverages low-rank tensor decomposition techniques including rank-1(CP), higher-rank CP, and T ucker decomposition to parameterize the weight tensor . An alternating optimization (double-gradient descent) algorithm is developed to efficiently solve the SPMD-LRT, iteratively updating factor matrices and core tensor . This approach enables SPMD-LRT to maintain the structural information of high-order data while optimizing margin distribution for improved classification. Extensive experiments on diverse datasets (including MNIST, images and fMRI neuroimaging) demonstrate that SPMD-LRT achieves superior classification accuracy compared to conventional SVM, vector-based LMDM, and prior tensor-based SVM extensions (Support T ensor Machines and Support T ucker Machines). These results confirm the effectiveness and robustness of SPMD-LRT in handling high-dimensional tensor data for classification. Dvances in data acquisition have led to an abundance of high-order tensor data (multi-dimensional arrays) across various domains, such as video sequences, medical imaging, and spatiotemporal sensor readings. Effectively learning from such tensor-structured data has become a pressing research focus [1] [2]. The multi-dimensional structure of tensors offers rich information (e.g.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.14577

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Neurology > Attention Deficit/Hyperactivity Disorder (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Refined Margin Distribution Analysis for Forest Representation Learning

Shen-Huan Lyu, Liang Yang, Zhi-Hua Zhou

Neural Information Processing SystemsAug-20-2025, 05:25:28 GMT

Neural Information Processing Systems http://nips.cc/

casforest model, dataset, latexit sha1, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks

D'Amico, Francesco, Bocchi, Dario, Negri, Matteo

arXiv.org Machine LearningMay-20-2025

Scaling laws in deep learning - empirical power-law relationships linking model performance to resource growth - have emerged as simple yet striking regularities across architectures, datasets, and tasks. These laws are particularly impactful in guiding the design of state-of-the-art models, since they quantify the benefits of increasing data or model size, and hint at the foundations of interpretability in machine learning. However, most studies focus on asymptotic behavior at the end of training or on the optimal training time given the model size. In this work, we uncover a richer picture by analyzing the entire training dynamics through the lens of spectral complexity norms. We identify two novel dynamical scaling laws that govern how performance evolves during training. These laws together recover the well-known test error scaling at convergence, offering a mechanistic explanation of generalization emergence. Our findings are consistent across CNNs, ResNets, and Vision Transformers trained on MNIST, CIFAR-10 and CIFAR-100. Furthermore, we provide analytical support using a solvable model: a single-layer perceptron trained with binary cross-entropy. In this setting, we show that the growth of spectral complexity driven by the implicit bias mirrors the generalization behavior observed at fixed norm, allowing us to connect the performance dynamics to classical learning rules in the perceptron.

artificial intelligence, machine learning, perceptron, (17 more...)

arXiv.org Machine Learning

2505.1323

Country: