AITopics | Statistical Learning

2b6bb5354a56ce256116b6b307a1ea10-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 06:33:07 GMT

artificial intelligence, machine learning, ridge regression, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

The Benefits of Implicit Regularization from SGD in Least Squares Problems

Neural Information Processing SystemsApr-25-2026, 06:33:03 GMT

Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches. In this work, we seek to understand these issues in the simpler setting of linear regression (including both underparameterized and overparameterized regimes), where our goal is to make sharp instance-based comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression. For a broad class of least squares problem instances (that are natural in high-dimensional settings), we show: (1) for every problem instance and for every ridge parameter, (unregularized) SGD, when provided with logarithmically more samples than that provided to the ridge algorithm, generalizes no worse than the ridge solution (provided SGD uses a tuned constant stepsize); (2) conversely, there exist instances (in this wide problem class) where optimally-tuned ridge regression requires quadratically more samples than SGD in order to have the same generalization performance. Taken together, our results show that, up to the logarithmic factors, the generalization performance of SGD is always no worse than that of ridge regression in a wide range of overparameterized problems, and, in fact, could be much better for some problem instances. More generally, our results show how algorithmic regularization has important consequences even in simpler (overparameterized) convex settings.

artificial intelligence, machine learning, ridge regression, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

160adf2dc118a920e7858484b92a37d8-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 06:17:41 GMT

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

2b6921f2c64dee16ba21ebf17f3c2c92-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 06:17:31 GMT

artificial intelligence, bayesian inference, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

2b3bb2c95195130977a51b3bb251c40a-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 06:15:05 GMT

artificial intelligence, machine learning, representation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
North America > Canada (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.46)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

2b0f658cbffd284984fb11d90254081f-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 06:14:56 GMT

artificial intelligence, machine learning, registration, (13 more...)

Neural Information Processing Systems

Genre: Research Report (0.93)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.70)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

Revisiting Active Sets for Gaussian Process Decoders

Neural Information Processing SystemsApr-25-2026, 06:03:22 GMT

Decoders built on Gaussian processes (GPs) are enticing due to the marginalisation over the non-linear function space. Such models (also known as GP-LVMs) are often expensive and notoriously difficult to train in practice, but can be scaled using variational inference and inducing points. In this paper, we revisit active set approximations. We develop a new stochastic estimate of the log-marginal likelihood based on recently discovered links to cross-validation, and we propose a computationally efficient approximation thereof. We demonstrate that the resulting stochastic active sets (SAS) approximation significantly improves the robustness of GP decoder training, while reducing computational cost. The SAS-GP obtains more structure in the latent space, scales to many datapoints, and learns better representations than variational autoencoders, which is rarely the case for GP decoders.

approximation, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

2b2011a7d5396faf5899863d896a3c24-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 06:03:04 GMT

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe (0.68)
North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Supplementary Material: " Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition "

Neural Information Processing SystemsApr-25-2026, 06:02:35 GMT

The input tensor shape is 6 3 3. The corresponding weight matrix has f = 20 rows (one row per filter) and 24 columns (c κ1 κ2), as for the corresponding feature matrix, it has 24 rows and 4 columns, the 4 here is the number of convolution windows (i.e., number of pixels/entries in each of the output feature maps). After multiplying those matrices, we reshape them to the desired shape to obtain the desired output feature maps. In this section, we provide more details pertaining to our method. A.1 Method Preliminaries Our layer-wise compression technique hinges upon the insight that any linear layer may be cast as a matrix multiplication, which enables us to rely on SVD as compression subroutine. Focusing on convolutions we show how such a layer can be recast as matrix multiplication. Similar approaches have been used by Denton et al. (2014); Idelbayev and Carreira-Perpinán (2020); Wen et al. (2017) among others. The equivalence of Y and Y can be easily established via an appropriate reshaping operation since p= p1p2. Equipped with the notion of correspondence between convolution and matrix multiplication our goal is to decompose the layer via its matrix operator W Rf cκ1κ2. To this end, we compute the j-rank approximation of W using SVD and factor it into a pair of smaller matrices U Rf j and V Rj cκ1κ2.

artificial intelligence, compression ratio, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

First-Order Algorithms for Min-Max Optimization in Geodesic Metric Spaces

Neural Information Processing SystemsApr-25-2026, 06:01:59 GMT

From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al. have recently shown that geodesic convex concave Riemannian problems always admit saddle-point solutions. Inspired by this result, we study whether a performance gap between Riemannian and optimal Euclidean space convex-concave algorithms is necessary. We answer this question in the negative--we prove that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically stronglyconvex-concave case, matching the Euclidean result. Our results also extend to the stochastic or non-smooth case where RCEG and Riemanian gradient ascent descent (RGDA) achieve near-optimal convergence rates up to factors depending on curvature of the manifold.

artificial intelligence, machine learning, manifold, (15 more...)

Neural Information Processing Systems

Country: Asia (0.15)

Genre: Research Report > New Finding (0.88)

Technology: