AITopics | Statistical Learning

The lottery ticket hypothesis (LTH) [20] states that learning on a properly pruned network (the winning ticket) improves test accuracy over the original unpruned network. Although LTH has been justified empirically in a broad range of deep neural network (DNN) involved applications like computer vision and natural language processing, the theoretical validation of the improved generalization of a winning ticket remains elusive. To the best of our knowledge, our work, for the first time, characterizes the performance of training a pruned neural network by analyzing the geometric structure of the objective function and the sample complexity to achieve zero generalization error. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned, indicating the structural importance of a winning ticket. Moreover, when the algorithm for training a pruned neural network is specified as an (accelerated) stochastic gradient descent algorithm, we theoretically show that the number of samples required for achieving zero generalization error is proportional to the number of the non-pruned weights in the hidden layer. With a fixed number of samples, training a pruned neural network enjoys a faster convergence rate to the desired model than training the original unpruned one, providing a formal justification of the improved generalization of the winning ticket. Our theoretical results are acquired from learning a pruned neural network of one hidden layer, while experimental results are further provided to justify the implications in pruning multi-layer neural networks.

artificial intelligence, machine learning, neural network, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Michigan (0.28)

Genre: Contests & Prizes (1.00)

Industry: Leisure & Entertainment > Gambling (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Cardinality-Regularized Hawkes-Granger Model

Neural Information Processing SystemsApr-24-2026, 20:32:47 GMT

This section provides parameter estimation equations in the MM procedure Eq. (13) for the baseline intensity µand the decay parameter β, which were omitted in the main text due to space limitations. Below, we provide results for the exponential and power distributions. This section describes the details of the experiments. We have included the Sparse5and Dense10 data sets and the Python code to generate those as part of the final submission. B.1 Data generation Sparse5 The Sparse5 benchmark dataset is designed to have a simplest but nontrivial kind of causal structure, which is supposed to be easily reproduced by any Granger-causal learning algorithms.

accuracy, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

15cf76466b97264765356fcc56d801d1-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 20:32:44 GMT

data mining, hawke process, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.95)

Industry:

Information Technology (1.00)
Energy (1.00)
Health & Medicine (0.93)
Government > Regional Government > North America Government > United States Government (0.69)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

0e5b96f97c1813bb75f6c28532c2ecc7-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 20:32:05 GMT

artificial intelligence, machine learning, objective, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Robots (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Explicit loss asymptotics in the gradient descent training of neural networks

Neural Information Processing SystemsApr-24-2026, 19:57:02 GMT

Current theoretical results on optimization trajectories of neural networks trained by gradient descent typically have the form of rigorous but potentially loose bounds on the loss values. In the present work we take a different approach and show that the learning trajectory of a wide network in a lazy training regime can be characterized by an explicit asymptotic at large training times. Specifically, the leading term in the asymptotic expansion of the loss behaves as a power law L(t) Ct ξ with exponent ξ expressed only through the data dimension, the smoothness of the activation function, and the class of function being approximated. Our results are based on spectral analysis of the integral operator representing the linearized evolution of a large network trained on the expected loss. Importantly, the techniques we employ do not require a specific form of the data distribution, for example Gaussian, thus making our findings sufficiently universal.

artificial intelligence, machine learning, singularity, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

ReSSL: Relational Self-Supervised Learning with Weak Augmentation

Neural Information Processing SystemsApr-24-2026, 19:56:01 GMT

Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations. However, most of methods mainly focus on the instance level information (i.e., the different augmented images of the same instance should have the same feature or cluster into the same class), but there is a lack of attention on the relationships between different instances. In this paper, we introduced a novel SSL paradigm, which we term as relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances. Specifically, our proposed method employs sharpened distribution of pairwise similarities among different instances as relation metric, which is thus utilized to match the feature embeddings of different augmentations. Moreover, to boost the performance, we argue that weak augmentations matter to represent a more reliable relation, and leverage momentum strategy for practical efficiency. Experimental results show that our proposed ReSSL significantly outperforms the previous stateof-the-art algorithms in terms of both performance and training efficiency.

artificial intelligence, inductive learning, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

Neural Information Processing SystemsApr-24-2026, 19:55:22 GMT

C.9 Further Analysis on Existing Post-Processing Calibration Methods We compare NORCAL to the existing post-calibration methods in the main paper (cf.

artificial intelligence, machine learning, norcal, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Filters

Collaborating Authors

Statistical Learning

164bf317ea19ccfd9e97853edc2389f4-Paper.pdf

162d18156abe38a3b32851b72b1d44f5-Paper.pdf

18561617ca0b4ffa293166b3186e04b0-Paper-Conference.pdf

Why Lottery Ticket Wins Perspective of Sample Complexity on Pruned Neural Networks

Cardinality-Regularized Hawkes-Granger Model

15cf76466b97264765356fcc56d801d1-Paper.pdf

0e5b96f97c1813bb75f6c28532c2ecc7-Paper-Conference.pdf

Explicit loss asymptotics in the gradient descent training of neural networks

ReSSL: Relational Self-Supervised Learning with Weak Augmentation

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation