AITopics | Statistical Learning

Implicit Bias of Gradient Descent on Linear Convolutional Networks

Neural Information Processing SystemsMar-16-2026, 17:59:11 GMT

We show that gradient descent on full-width linear convolutional networks of depth $L$ converges to a linear predictor related to the $\ell_{2/L}$ bridge penalty in the frequency domain. This is in contrast to linearly fully connected networks, where gradient descent converges to the hard margin linear SVM solution, regardless of depth.

artificial intelligence, machine learning, proceedings, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.97)

Add feedback

Analytic solution and stationary phase approximation for the Bayesian lasso and elastic net

Neural Information Processing SystemsMar-16-2026, 17:26:16 GMT

The lasso and elastic net linear regression models impose a double-exponential prior distribution on the model parameters to achieve regression shrinkage and variable selection, allowing the inference of robust models from large data sets. However, there has been limited success in deriving estimates for the full posterior distribution of regression coefficients in these models, due to a need to evaluate analytically intractable partition function integrals. Here, the Fourier transform is used to express these integrals as complex-valued oscillatory integrals over regression frequencies. This results in an analytic expansion and stationary phase approximation for the partition functions of the Bayesian lasso and elastic net, where the non-differentiability of the double-exponential prior has so far eluded such an approach. Use of this approximation leads to highly accurate numerical estimates for the expectation values and marginal posterior distributions of the regression coefficients, and allows for Bayesian inference of much higher dimensional models than previously possible.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Proximal SCOPE for Distributed Sparse Learning

Shenyi Zhao, Gong-Duo Zhang, Ming-Wei Li, Wu-Jun Li

Neural Information Processing SystemsMar-16-2026, 14:46:21 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, partition, (14 more...)

Neural Information Processing Systems

Country:

North America (0.28)
Asia > China (0.16)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)

Add feedback

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms

Neural Information Processing SystemsMar-16-2026, 06:11:17 GMT

Semi-supervised learning (SSL) provides a powerful framework for leveraging unlabeled data when labels are limited or expensive to obtain. SSL algorithms based on deep neural networks have recently proven successful on standard benchmark tasks. However, we argue that these benchmarks fail to address many issues that SSL algorithms would face in real-world applications. After creating a unified reimplementation of various widely-used SSL techniques, we test them in a suite of experiments designed to address these issues. We find that the performance of simple baselines which do not use unlabeled data is often underreported, SSL methods differ in sensitivity to the amount of labeled and unlabeled data, and performance can degrade substantially when the unlabeled dataset contains out-ofdistribution examples. To help guide SSL research towards real-world applicability, we make our unified reimplemention and evaluation platform publicly available.2

artificial intelligence, machine learning, unlabeled data, (15 more...)

Neural Information Processing Systems

Country: North America > Canada (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Theoretical guarantees for EM under misspecified Gaussian mixture models

Raaz Dwivedi, nhật Hồ, Koulik Khamaru, Martin J. Wainwright, Michael I. Jordan

Neural Information Processing SystemsMar-16-2026, 03:04:25 GMT

Recent years have witnessed substantial progress in understanding the behavior of EM for mixture models that are correctly specified. Given that model misspecification is common in practice, it is important to understand EM in this more general setting. We provide non-asymptotic guarantees for the population and sample-based EM algorithms when used to estimate parameters of certain misspecified Gaussian mixture models.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.65)

Add feedback

Robust Hypothesis Testing Using Wasserstein Uncertainty Sets

RUI GAO, Liyan Xie, Yao Xie, Huan Xu

Neural Information Processing SystemsMar-16-2026, 01:06:44 GMT

We develop a novel computationally efficient and general framework for robust hypothesis testing. The new framework features a new way to construct uncertainty sets under the null and the alternative distributions, which are sets centered around the empirical distribution defined via Wasserstein metric, thus our approach is data-driven and free of distributional assumptions. We develop a convex safe approximation of the minimax formulation and show that such approximation renders a nearly-optimal detector among the family of all possible tests. By exploiting the structure of the least favorable distribution, we also develop a tractable reformulation of such approximation, with complexity independent of the dimension of observation space and can be nearly sample-size-independent in general. Real-data example using human activity data demonstrated the excellent performance of the new robust detector.

artificial intelligence, detector, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

a07c2f3b3b907aaf8436a26c6d77f0a2-Paper.pdf

Neural Information Processing SystemsMar-16-2026, 01:06:30 GMT

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.32)

Add feedback

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD

Zeyuan Allen-Zhu

Neural Information Processing SystemsMar-16-2026, 00:05:33 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, convex, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration

Murden, Raphiel J., Tian, Ganzhong, Qiu, Deqiang, Risk, Benajmin B.

arXiv.org Machine LearningMar-16-2026

Collecting multiple types of data on the same set of subjects is common in modern scientific applications including, genomics, metabolomics, and neuroimaging. Joint and Individual Variance Explained (JIVE) seeks a low-rank approximation of the joint variation between two or more sets of features captured on common subjects and isolates this variation from that unique to eachset of features. We develop an expectation-maximization (EM) algorithm to estimate a probabilistic model for the JIVE framework. The model extends probabilistic principal components analysis to multiple data sets. Our maximum likelihood approach simultaneously estimates joint and individual components, which can lead to greater accuracy compared to other methods. We apply ProJIVE to measures of brain morphometry and cognition in Alzheimer's disease. ProJIVE learns biologically meaningful courses of variation, and the joint morphometry and cognition subject scores are strongly related to more expensive existing biomarkers. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. Code to reproduce the analysis is available on our GitHub page.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1080/10618600.2026.2639081

2603.12351

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

EB-RANSAC: Random Sample Consensus based on Energy-Based Model

Yasuda, Muneki, Watanabe, Nao, Sekimoto, Kaiji

arXiv.org Machine LearningMar-16-2026

Random sample consensus (RANSAC), which is based on a repetitive sampling from a given dataset, is one of the most popular robust estimation methods. In this study, an energy-based model (EBM) for robust estimation that has a similar scheme to RANSAC, energy-based RANSAC (EB-RANSAC), is proposed. EB-RANSAC is applicable to a wide range of estimation problems similar to RANSAC. However, unlike RANSAC, EB-RANSAC does not require a troublesome sampling procedure and has only one hyperparameter. The effectiveness of EB-RANSAC is numerically demonstrated in two applications: a linear regression and maximum likelihood estimation.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

2603.12525

Country: