AITopics | Kur, Gil

Collaborating Authors

Kur, Gil

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Early-Stopped Mirror Descent for Linear Regression over Convex Bodies

Wegel, Tobias, Kur, Gil, Rebeschini, Patrick

arXiv.org Machine LearningMar-5-2025

Early-stopped iterative optimization methods are widely used as alternatives to explicit regularization, and direct comparisons between early-stopping and explicit regularization have been established for many optimization geometries. However, most analyses depend heavily on the specific properties of the optimization geometry or strong convexity of the empirical objective, and it remains unclear whether early-stopping could ever be less statistically efficient than explicit regularization for some particular shape constraint, especially in the overparameterized regime. To address this question, we study the setting of high-dimensional linear regression under additive Gaussian noise when the ground truth is assumed to lie in a known convex body and the task is to minimize the in-sample mean squared error. Our main result shows that for any convex body and any design matrix, up to an absolute constant factor, the worst-case risk of unconstrained early-stopped mirror descent with an appropriate potential is at most that of the least squares estimator constrained to the convex body. We achieve this by constructing algorithmic regularizers based on the Minkowski functional of the convex body.

artificial intelligence, convex, machine learning, (18 more...)

arXiv.org Machine Learning

2503.03426

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.27)
Europe > Switzerland (0.27)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.60)

Add feedback

On the Variance, Admissibility, and Stability of Empirical Risk Minimization

Kur, Gil, Putterman, Eli, Rakhlin, Alexander

arXiv.org Artificial IntelligenceMay-29-2023

Maximum Likelihood and the method of Least Squares are fundamental procedures in statistics. The study of asymptotic consistency of Maximum Likelihood has been central to the field for almost a century (Wald, 1949). Along with consistency, failures of Maximum Likelihood have been thoroughly investigated for nearly as long (Neyman and Scott, 1948; Bahadur, 1958; Ferguson, 1982). In the context of estimation of nonparametric models, the seminal work of (Birgé and Massart, 1993) provided sufficient conditions for minimax optimality (in a non-asymptotic sense) of Least Squares while also presenting an example of a model class where this basic procedure is sub-optimal. Three decades later, we still do not have necessary and sufficient conditions for minimax optimality of Least Squares when the model class is large. While the present paper does not resolve this question, it makes several steps towards understanding the behavior of Least Squares -- equivalently, Empirical Risk Minimization (ERM) with square loss -- in large models. Beyond intellectual curiosity, the question of minimax optimality of Least Squares is driven by the desire to understand the current practice of fitting large or overparametrized models, such as neural networks, to data (cf.

artificial intelligence, assumption, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.18508

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

On the Minimal Error of Empirical Risk Minimization

Kur, Gil, Rakhlin, Alexander

arXiv.org Machine LearningFeb-23-2021

An increasing number of machine learning applications employ flexible overparameterized models to fit the training data. Theoretical analysis of such'overfitted' solutions has been a recent focus of the learning community. It is conjectured that the use of large overparameterized neural networks makes the loss landscape amenable to optimization through local search methods, such as stochastic gradient descent. It is also hypothesized that implicit regularization, arising from the choice of the optimization algorithm and the neural network architecture, mitigates the large complexity and ensures that the'overfitted' solutions generalize. Suppose a'simple' class H of models captures the relationship between the covariates X and the response variable Y. Inspired by the use of overparameterized models, we may take a much larger class F H for computational or other purposes (such as lack of explicit description of H) and minimize training loss over this larger class.

artificial intelligence, neural network, probability, (18 more...)

arXiv.org Machine Learning

2102.12066

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Convex Regression in Multidimensions: Suboptimality of Least Squares Estimators

Kur, Gil, Gao, Fuchang, Guntuboyina, Adityanand, Sen, Bodhisattva

arXiv.org Machine LearningJun-3-2020

The least squares estimator (LSE) is shown to be suboptimal in squared error loss in the usual nonparametric regression model with Gaussian errors for $d \geq 5$ for each of the following families of functions: (i) convex functions supported on a polytope (in fixed design), (ii) bounded convex functions supported on a polytope (in random design), and (iii) convex Lipschitz functions supported on any convex domain (in random design). For each of these families, the risk of the LSE is proved to be of the order $n^{-2/d}$ (up to logarithmic factors) while the minimax risk is $n^{-4/(d+4)}$, for $d \ge 5$. In addition, the first rate of convergence results (worst case and adaptive) for the full convex LSE are established for polytopal domains for all $d \geq 1$. Some new metric entropy results for convex functions are also proved which are of independent interest.

artificial intelligence, convex regression, machine learning, (16 more...)

arXiv.org Machine Learning

2006.02044

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Space lower bounds for linear prediction

Dagan, Yuval, Kur, Gil, Shamir, Ohad

arXiv.org Machine LearningFeb-9-2019

We show that fundamental learning tasks, such as finding an approximate linear separator or linear regression, require memory at least \emph{quadratic} in the dimension, in a natural streaming setting. This implies that such problems cannot be solved (at least in this setting) by scalable memory-efficient streaming algorithms. Our results build on a memory lower bound for a simple linear-algebraic problem -- finding orthogonal vectors -- and utilize the estimates on the packing of the Grassmannian, the manifold of all linear subspaces of fixed dimension.

artificial intelligence, machine learning, probability, (15 more...)

arXiv.org Machine Learning

1902.03498

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback