AITopics | Daniel J. Hsu

Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate

Mikhail Belkin, Daniel J. Hsu, Partha Mitra

Neural Information Processing SystemsMar-27-2025, 03:03:03 GMT

Many modern machine learning models are trained to achieve zero or near-zero training error in order to obtain near-optimal (but non-zero) test error. This phenomenon of strong generalization performance for "overfitted" / interpolated classifiers appears to be ubiquitous in high-dimensional data, having been observed in deep networks, kernel machines, boosting and random forests. Their performance is consistently robust even when the data contain large amounts of label noise. Very little theory is available to explain these observations. The vast majority of theoretical analyses of generalization allows for interpolation only when there is little or no label noise. This paper takes a step toward a theoretical foundation for interpolated classifiers by analyzing local interpolating schemes, including geometric simplicial interpolation algorithm and singularly weighted k-nearest neighbor schemes. Consistency or near-consistency is proved for these schemes in classification and regression problems.

artificial intelligence, interpolation, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland (0.46)
North America > United States > Ohio (0.40)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.49)

Add feedback

On the number of variables to use in principal component regression

Ji Xu, Daniel J. Hsu

Neural Information Processing SystemsMar-27-2025, 02:17:07 GMT

Neural Information Processing Systems http://nips.cc/

apple, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Benefits of over-parameterization with EM

Ji Xu, Daniel J. Hsu, Arian Maleki

Neural Information Processing SystemsMar-26-2025, 11:51:15 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, bayesian inference, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

Add feedback

Leveraged volume sampling for linear regression

Michal Derezinski, Manfred K. K. Warmuth, Daniel J. Hsu

Neural Information Processing SystemsMar-23-2025, 22:01:26 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, leverage score, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.42)

Add feedback

On the number of variables to use in principal component regression

Ji Xu, Daniel J. Hsu

Neural Information Processing SystemsJan-27-2025, 17:33:47 GMT

We study least squares linear regression over N uncorrelated Gaussian features that are selected in order of decreasing variance.

apple, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Global Analysis of Expectation Maximization for Mixtures of Two Gaussians

Ji Xu, Daniel J. Hsu, Arian Maleki

Neural Information Processing SystemsJan-20-2025, 13:26:53 GMT

Expectation Maximization (EM) is among the most popular algorithms for estimating parameters of statistical models. However, EM, which is an iterative algorithm based on the maximum likelihood principle, is generally only guaranteed to find stationary points of the likelihood objective, and these points may be far from any maximizer. This article addresses this disconnect between the statistical principles behind EM and its algorithmic properties. Specifically, it provides a global analysis of EM for specific models in which the observations comprise an i.i.d.

Add feedback

Search Improves Label for Active Learning

Alina Beygelzimer, Daniel J. Hsu, John Langford, Chicheng Zhang

Neural Information Processing SystemsJan-20-2025, 11:05:00 GMT

The learner provides an unlabeled example to the oracle, and the oracle responds with the label.

artificial intelligence, counterexample, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Linear regression without correspondence

Daniel J. Hsu, Kevin Shi, Xiaorui Sun

Neural Information Processing SystemsOct-8-2024, 08:27:08 GMT

This article considers algorithmic and statistical aspects of linear regression when the correspondence between the covariates and the responses is unknown. First, a fully polynomial-time approximation scheme is given for the natural least squares optimization problem in any constant dimension. Next, in an average-case and noise-free setting where the responses exactly correspond to a linear function of i.i.d.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)

Add feedback

Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate

Mikhail Belkin, Daniel J. Hsu, Partha Mitra

Neural Information Processing SystemsOct-8-2024, 07:37:36 GMT

Many modern machine learning models are trained to achieve zero or near-zero training error in order to obtain near-optimal (but non-zero) test error. This phenomenon of strong generalization performance for "overfitted" / interpolated classifiers appears to be ubiquitous in high-dimensional data, having been observed in deep networks, kernel machines, boosting and random forests. Their performance is consistently robust even when the data contain large amounts of label noise. Very little theory is available to explain these observations. The vast majority of theoretical analyses of generalization allows for interpolation only when there is little or no label noise. This paper takes a step toward a theoretical foundation for interpolated classifiers by analyzing local interpolating schemes, including geometric simplicial interpolation algorithm and singularly weighted k-nearest neighbor schemes. Consistency or near-consistency is proved for these schemes in classification and regression problems.

artificial intelligence, interpolation, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland (0.46)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.88)

Add feedback

Benefits of over-parameterization with EM

Ji Xu, Daniel J. Hsu, Arian Maleki

Neural Information Processing SystemsOct-7-2024, 17:00:16 GMT

Expectation Maximization (EM) is among the most popular algorithms for maximum likelihood estimation, but it is generally only guaranteed to find its stationary points of the log-likelihood objective. The goal of this article is to present theoretical and empirical evidence that over-parameterization can help EM avoid spurious local optima in the log-likelihood. We consider the problem of estimating the mean vectors of a Gaussian mixture model in a scenario where the mixing weights are known. Our study shows that the global behavior of EM, when one uses an over-parameterized model in which the mixing weights are treated as unknown, is better than that when one uses the (correct) model with the mixing weights fixed to the known values. For symmetric Gaussians mixtures with two components, we prove that introducing the (statistically redundant) weight parameters enables EM to find the global maximizer of the log-likelihood starting from almost any initial mean parameters, whereas EM without this over-parameterization may very often fail. For other Gaussian mixtures, we provide empirical evidence that shows similar behavior. Our results corroborate the value of over-parameterization in solving non-convex optimization problems, previously observed in other domains.

Add feedback

Filters

Collaborating Authors

Daniel J. Hsu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate

On the number of variables to use in principal component regression

Benefits of over-parameterization with EM

Leveraged volume sampling for linear regression

On the number of variables to use in principal component regression

Global Analysis of Expectation Maximization for Mixtures of Two Gaussians

Search Improves Label for Active Learning

Linear regression without correspondence

Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate

Benefits of over-parameterization with EM