AITopics | square problem

Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches. In this work, we seek to understand these issues in the simpler setting of linear regression (including both underparameterized and overparameterized regimes), where our goal is to make sharp instance-based comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression. For a broad class of least squares problem instances (that are natural in high-dimensional settings), we show: (1) for every problem instance and for every ridge parameter, (unregularized) SGD, when provided with logarithmically more samples than that provided to the ridge algorithm, generalizes no worse than the ridge solution (provided SGD uses a tuned constant stepsize); (2) conversely, there exist instances (in this wide problem class) where optimally-tuned ridge regression requires quadratically more samples than SGD in order to have the same generalization performance. Taken together, our results show that, up to the logarithmic factors, the generalization performance of SGD is always no worse than that of ridge regression in a wide range of overparameterized problems, and, in fact, could be much better for some problem instances. More generally, our results show how algorithmic regularization has important consequences even in simpler (overparameterized) convex settings.

artificial intelligence, machine learning, ridge regression, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

AFast Scale-Invariant Algorithm for Non-negative Least Squares with Non-negative Data

Neural Information Processing SystemsApr-25-2026, 04:58:40 GMT

Nonnegative (linear) least square problems are a fundamental class of problems that is well-studied in statistical learning and for which solvers have been implemented in many of the standard programming languages used within the machine learning community. The existing off-the-shelf solvers view the non-negativity constraint in these problems as an obstacle and, compared to unconstrained least squares, perform additional effort to address it. However, in many of the typical applications, the data itself is nonnegative as well, and we show that the nonnegativity in this case makes the problem easier. In particular, while the worst-case dimension-independent oracle complexity for unconstrained least squares problems necessarily scales with one of the data matrix constants (typically the spectral norm) and these problems are solved to additive error, we show that nonnegative least squares problems with nonnegative data are solvable to multiplicative error and with complexity independent of any matrix constants. The algorithm we introduce is accelerated and based on a primal-dual perspective. We further show how to provably obtain linear convergence using adaptive restart coupled with our method and demonstrate its effectiveness on large-scale data via numerical experiments.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.15)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis

Weiran Wang, Jialei Wang, Dan Garber, Dan Garber, Nati Srebro

Neural Information Processing SystemsMar-23-2026, 07:19:05 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Efficient Leverage Score Sampling for Tensor Train Decomposition

Neural Information Processing SystemsFeb-16-2026, 09:03:35 GMT

In this paper, we propose an efficient algorithm to accelerate computing the TT decomposition with the Alternating Least Squares (ALS) algorithm relying on exact leverage scores sampling.

artificial intelligence, decomposition, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Africa > Senegal > Kolda Region > Kolda (0.05)
North America > United States > District of Columbia > Washington (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre:

Research Report (1.00)
Workflow (0.68)

Industry:

Energy (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Fast Exact Leverage Score Sampling from Khatri-Rao Products with Applications to Tensor Decomposition

Neural Information Processing SystemsFeb-15-2026, 23:56:20 GMT

As a result, it tractably draws samples even when the matrices forming the Khatri-Rao product have tens of millions of rows each.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Africa > Senegal > Kolda Region > Kolda (0.05)
Europe > Germany > Hesse > Darmstadt Region > Frankfurt (0.04)

Industry:

Energy (0.93)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.51)

Add feedback

efcb76ac1df9231a24893a957fcb9001-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 18:36:23 GMT

batch fraction, convergence rate, momentum, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.15)
Europe > Russia (0.04)
Asia > Russia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Total Least Squares Regression in Input Sparsity Time

Huaian Diao, Zhao Song, David Woodruff, Xin Yang

Neural Information Processing SystemsFeb-12-2026, 05:03:27 GMT

In the total least squares problem, one is given an m n matrix A, and an m d matrix B, and one seeks to "correct" both A and B, obtaining matrices Â and B, so that there exists an X satisfying the equation ÂX = B. Typically the problem is overconstrained, meaning that m max(n, d).

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Virginia (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Filters

Collaborating Authors

square problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

4cf0ed8641cfcbbf46784e620a0316fb-Paper.pdf

0bed45bd5774ffddc95ffe500024f628-Paper.pdf

2b6bb5354a56ce256116b6b307a1ea10-Supplemental.pdf

The Benefits of Implicit Regularization from SGD in Least Squares Problems

AFast Scale-Invariant Algorithm for Non-negative Least Squares with Non-negative Data

Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis

Efficient Leverage Score Sampling for Tensor Train Decomposition

Fast Exact Leverage Score Sampling from Khatri-Rao Products with Applications to Tensor Decomposition

efcb76ac1df9231a24893a957fcb9001-Paper-Conference.pdf

Total Least Squares Regression in Input Sparsity Time