AITopics | Carmon, Yair

Collaborating Authors

Carmon, Yair

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Making SGD Parameter-Free

Carmon, Yair, Hinder, Oliver

arXiv.org Artificial IntelligenceApr-13-2023

Stochastic convex optimization (SCO) is a cornerstone of both the theory and practice of machine learning. Consequently, there is intense interest in developing SCO algorithms that require little to no prior knowledge of the problem parameters, and hence little to no tuning [27, 23, 20, 2, 22, 39]. In this work we consider the fundamental problem of non-smooth SCO (in a potentially unbounded domain) and seek methods that are adaptive to a key problem parameter: the initial distance to optimality. Current approaches for tackling this problem focus on the more general online learning problem of parameter-free regret minimization [8, 10, 11, 12, 21, 24, 25, 30, 32, 37], where the goal is to to obtain regret guarantees that are valid for comparators with arbitrary norms. Research on parameter-free regret minimization has lead to practical algorithms for stochastic optimization [9, 27, 32], methods that are able to adapt to many problem parameters simultaneously [37] and methods that can work with any norm [12].

algorithm 1, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2205.0216

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Malign Overfitting: Interpolation Can Provably Preclude Invariance

Wald, Yoav, Yona, Gal, Shalit, Uri, Carmon, Yair

arXiv.org Artificial IntelligenceNov-28-2022

Learned classifiers should often possess certain invariance properties meant to encourage fairness, robustness, or out-of-distribution generalization. However, multiple recent works empirically demonstrate that common invariance-inducing regularizers are ineffective in the over-parameterized regime, in which classifiers perfectly fit (i.e. interpolate) the training data. This suggests that the phenomenon of ``benign overfitting," in which models generalize well despite interpolating, might not favorably extend to settings in which robustness or fairness are desirable. In this work we provide a theoretical justification for these observations. We prove that -- even in the simplest of settings -- any interpolating learning rule (with arbitrarily small margin) will not satisfy these invariance properties. We then propose and analyze an algorithm that -- in the same setting -- successfully learns a non-interpolating classifier that is provably invariant. We validate our theoretical observations on simulated data and the Waterbirds dataset.

artificial intelligence, machine learning, null, (16 more...)

arXiv.org Artificial Intelligence

2211.15724

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.45)

Add feedback

Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization

Miller, John, Taori, Rohan, Raghunathan, Aditi, Sagawa, Shiori, Koh, Pang Wei, Shankar, Vaishaal, Liang, Percy, Carmon, Yair, Schmidt, Ludwig

arXiv.org Machine LearningJul-9-2021

For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet, a synthetic pose estimation task derived from YCB objects, satellite imagery classification in FMoW-WILDS, and wildlife classification in iWildCam-WILDS. The strong correlations hold across model architectures, hyperparameters, training set size, and training duration, and are more precise than what is expected from existing domain adaptation theory. To complete the picture, we also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS. Finally, we provide a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.

deep learning, linear trend, neural network, (23 more...)

arXiv.org Machine Learning

2107.04649

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Large-Scale Methods for Distributionally Robust Optimization

Levy, Daniel, Carmon, Yair, Duchi, John C., Sidford, Aaron

arXiv.org Machine LearningOct-12-2020

We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence uncertainty sets. We prove that our algorithms require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications. For $\chi^2$ uncertainty sets these are the first such guarantees in the literature, and for CVaR our guarantees scale linearly in the uncertainty level rather than quadratically as in previous work. We also provide lower bounds proving the worst-case optimality of our algorithms for CVaR and a penalized version of the $\chi^2$ problem. Our primary technical contributions are novel bounds on the bias of batch robust risk estimation and the variance of a multilevel Monte Carlo gradient estimator due to [Blanchet & Glynn, 2015]. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.

cvar, neural network, optimization problem, (19 more...)

arXiv.org Machine Learning

2010.05893

Country:

Europe > United Kingdom > England (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Vision (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

Arjevani, Yossi, Carmon, Yair, Duchi, John C., Foster, Dylan J., Sekhari, Ayush, Sridharan, Karthik

arXiv.org Machine LearningJun-24-2020

We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and---surprisingly---that it cannot be improved using stochastic $p$th order methods for any $p\ge 2$, even when the first $p$ derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding $(\epsilon,\gamma)$-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

2006.13476

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Unlabeled Data Improves Adversarial Robustness

Carmon, Yair, Raghunathan, Aditi, Schmidt, Ludwig, Liang, Percy, Duchi, John C.

arXiv.org Machine LearningJun-10-2019

The past few years have seen an intense research interest in making models robust to adversarial examples [37]. Yet despite a wide range of proposed defenses, the state-of-the-art in adversarial robustness is far from satisfactory. Recent work points towards sample complexity as a possible reason for the small gains in robustness: Schmidt et al. [35] show that in a simple model, learning a classifier with nontrivial adversarially robust accuracy requires substantially more samples than achieving good "standard" accuracy. Furthermore, recent empirical work obtains promising gains in robustness via transfer learning of a robust classifier from a larger labeled dataset [15]. While both theory and experiments suggest that more training data leads to greater robustness, following this suggestion can be difficult due to the cost of gathering additional data and especially obtaining high-quality labels.

accuracy, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

1905.13736

Country: North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

A Rank-1 Sketch for Matrix Multiplicative Weights

Carmon, Yair, Duchi, John C., Sidford, Aaron, Tian, Kevin

arXiv.org Machine LearningMar-6-2019

We show that a simple randomized sketch of the matrix multiplicative weight (MMW) update enjoys the same regret bounds as MMW, up to a small constant factor. Unlike MMW, where every step requires full matrix exponentiation, our steps require only a single product of the form $e^A b$, which the Lanczos method approximates efficiently. Our key technique is to view the sketch as a randomized mirror projection, and perform mirror descent analysis on the expected projection. Our sketch solves the online eigenvector problem, improving the best known complexity bounds. We also apply this sketch to a simple no-regret scheme for semidefinite programming in saddle-point form, where it matches the best known guarantees.

artificial intelligence, cosh, machine learning, (19 more...)

arXiv.org Machine Learning

1903.02675

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Analysis of Krylov Subspace Solutions of Regularized Non-Convex Quadratic Problems

Carmon, Yair, Duchi, John C.

Neural Information Processing SystemsDec-31-2018

We provide convergence rates for Krylov subspace solutions to the trust-region and cubic-regularized (nonconvex) quadratic problems. Such solutions may be efficiently computed by the Lanczos method and have long been used in practice. We prove error bounds of the form $1/t^2$ and $e^{-4t/\sqrt{\kappa}}$, where $\kappa$ is a condition number for the problem, and $t$ is the Krylov subspace order (number of Lanczos iterations). We also provide lower bounds showing that our analysis is sharp.

artificial intelligence, machine learning, optimization, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East (0.14)
North America > United States (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Analysis of Krylov Subspace Solutions of Regularized Non-Convex Quadratic Problems

Carmon, Yair, Duchi, John C.

Neural Information Processing SystemsDec-31-2018

artificial intelligence, machine learning, optimization, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East (0.14)
North America > United States (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

No bad local minima: Data independent training error guarantees for multilayer neural networks

Soudry, Daniel, Carmon, Yair

arXiv.org Machine LearningMay-30-2016

We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is zero at every differentiable local minimum, for almost every dataset and dropout-like noise realization. We then extend these results to the case of more than one hidden layer. Our theoretical guarantees assume essentially nothing on the training data, and are verified numerically. These results suggest why the highly non-convex loss of such MNNs can be easily optimized using local updates (e.g., stochastic gradient descent), as observed empirically.

artificial intelligence, mnn, neural network, (15 more...)

arXiv.org Machine Learning

1605.08361

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback