AITopics | Davis, Damek

Collaborating Authors

Davis, Damek

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Online Covariance Estimation in Nonsmooth Stochastic Approximation

Jiang, Liwei, Roy, Abhishek, Balasubramanian, Krishna, Davis, Damek, Drusvyatskiy, Dmitriy, Na, Sen

arXiv.org Machine LearningFeb-7-2025

We consider applying stochastic approximation (SA) methods to solve nonsmooth variational inclusion problems. Existing studies have shown that the averaged iterates of SA methods exhibit asymptotic normality, with an optimal limiting covariance matrix in the local minimax sense of H\'ajek and Le Cam. However, no methods have been proposed to estimate this covariance matrix in a nonsmooth and potentially non-monotone (nonconvex) setting. In this paper, we study an online batch-means covariance matrix estimator introduced in Zhu et al.(2023). The estimator groups the SA iterates appropriately and computes the sample covariance among batches as an estimate of the limiting covariance. Its construction does not require prior knowledge of the total sample size, and updates can be performed recursively as new data arrives. We establish that, as long as the batch size sequence is properly specified (depending on the stepsize sequence), the estimator achieves a convergence rate of order $O(\sqrt{d}n^{-1/8+\varepsilon})$ for any $\varepsilon>0$, where $d$ and $n$ denote the problem dimensionality and the number of iterations (or samples) used. Although the problem is nonsmooth and potentially non-monotone (nonconvex), our convergence rate matches the best-known rate for covariance estimation methods using only first-order information in smooth and strongly-convex settings. The consistency of this covariance estimator enables asymptotically valid statistical inference, including constructing confidence intervals and performing hypothesis testing.

artificial intelligence, inequality follow, machine learning, (17 more...)

arXiv.org Machine Learning

2502.05305

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > United States > California > Yolo County > Davis (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth

Davis, Damek, Drusvyatskiy, Dmitriy, Jiang, Liwei

arXiv.org Artificial IntelligenceSep-29-2024

A prevalent belief among optimization specialists is that linear convergence of gradient descent is contingent on the function growing quadratically away from its minimizers. In this work, we argue that this belief is inaccurate. We show that gradient descent with an adaptive stepsize converges at a local (nearly) linear rate on any smooth function that merely exhibits fourth-order growth away from its minimizer. The adaptive stepsize we propose arises from an intriguing decomposition theorem: any such function admits a smooth manifold around the optimal solution -- which we call the ravine -- so that the function grows at least quadratically away from the ravine and has constant order growth along it. The ravine allows one to interlace many short gradient steps with a single long Polyak gradient step, which together ensure rapid convergence to the minimizer. We illustrate the theory and algorithm on the problems of matrix sensing and factorization and learning a single neuron in the overparameterized regime.

artificial intelligence, dist, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2409.19791

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Liu, Chaoyue, Drusvyatskiy, Dmitriy, Belkin, Mikhail, Davis, Damek, Ma, Yi-An

arXiv.org Artificial IntelligenceJun-5-2023

Modern machine learning paradigms, such as deep learning, occur in or close to the interpolation regime, wherein the number of model parameters is much larger than the number of data samples. In this work, we propose a regularity condition within the interpolation regime which endows the stochastic gradient method with the same worst-case iteration complexity as the deterministic gradient method, while using only a single sampled gradient (or a minibatch) in each iteration. In contrast, all existing guarantees require the stochastic gradient method to take small steps, thereby resulting in a much slower linear rate of convergence. Finally, we demonstrate that our condition holds when training sufficiently wide feedforward neural networks with a linear output layer.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.02601

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Asymptotic normality and optimality in nonsmooth stochastic approximation

Davis, Damek, Drusvyatskiy, Dmitriy, Jiang, Liwei

arXiv.org Machine LearningJan-16-2023

Polyak and Juditsky [30] famously showed that the stochastic gradient method for minimizing smooth and strongly convex functions enjoys a central limit theorem: the error between the running average of the iterates and the minimizer, normalized by the square root of the iteration counter, converges to a normal random vector. Moreover, the covariance matrix of the limiting distribution is in a precise sense "optimal" among any estimation procedure. A long standing open question is whether similar guarantees - asymptotic normality and optimality - exist for nonsmooth optimization and, more generally, for equilibrium problems. In this work, we obtain such guarantees under mild conditions that hold both in concrete circumstances (e.g.

artificial intelligence, machine learning, manifold, (17 more...)

arXiv.org Machine Learning

2301.06632

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)

Add feedback

Active manifolds, stratifications, and convergence to local minima in nonsmooth optimization

Davis, Damek, Drusvyatskiy, Dmitriy, Jiang, Liwei

arXiv.org Artificial IntelligenceJan-9-2023

We show that the subgradient method converges only to local minimizers when applied to generic Lipschitz continuous and subdifferentially regular functions that are definable in an o-minimal structure. At a high level, the argument we present is appealingly transparent: we interpret the nonsmooth dynamics as an approximate Riemannian gradient method on a certain distinguished submanifold that captures the nonsmooth activity of the function. In the process, we develop new regularity conditions in nonsmooth analysis that parallel the stratification conditions of Whitney, Kuo, and Verdier and extend stochastic processes techniques of Pemantle.

artificial intelligence, machine learning, manifold, (16 more...)

arXiv.org Artificial Intelligence

2108.11832

Country: North America > United States (0.67)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)

Add feedback

Clustering a Mixture of Gaussians with Unknown Covariance

Davis, Damek, Diaz, Mateo, Wang, Kaizheng

arXiv.org Machine LearningOct-4-2021

We investigate a clustering problem with data from a mixture of Gaussians that share a common but unknown, and potentially ill-conditioned, covariance matrix. We start by considering Gaussian mixtures with two equally-sized components and derive a Max-Cut integer program based on maximum likelihood estimation. We prove its solutions achieve the optimal misclassification rate when the number of samples grows linearly in the dimension, up to a logarithmic factor. However, solving the Max-cut problem appears to be computationally intractable. To overcome this, we develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size. Although this sample complexity is worse than that of the Max-cut problem, we conjecture that no polynomial-time method can perform better. Furthermore, we gather numerical and theoretical evidence that supports the existence of a statistical-computational gap. Finally, we generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights. It enjoys similar optimality guarantees for mixtures of distributions that satisfy a transportation-cost inequality, encompassing Gaussian and strongly log-concave distributions.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2110.01602

Country: North America > United States > New York (0.27)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

Add feedback

Escaping strict saddle points of the Moreau envelope in nonsmooth optimization

Davis, Damek, Díaz, Mateo, Drusvyatskiy, Dmitriy

arXiv.org Machine LearningJun-17-2021

Recent work has shown that stochastically perturbed gradient methods can efficiently escape strict saddle points of smooth functions. We extend this body of work to nonsmooth optimization, by analyzing an inexact analogue of a stochastically perturbed gradient method applied to the Moreau envelope. The main conclusion is that a variety of algorithms for nonsmooth optimization can escape strict saddle points of the Moreau envelope at a controlled rate. The main technical insight is that typical algorithms applied to the proximal subproblem yield directions that approximate the gradient of the Moreau envelope in relative terms.

artificial intelligence, inequality follow, optimization problem, (16 more...)

arXiv.org Machine Learning

2106.09815

Country:

North America > United States > New York (0.14)
North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Robust stochastic optimization with the proximal point method

Davis, Damek, Drusvyatskiy, Dmitriy

arXiv.org Machine LearningAug-1-2019

Stochastic convex optimization lies at the core of modern statistica l and machine learning. Standard results in the subject bound the number of samples that an algorithm needs to generate a point with small function value in expectation. More nuanced high probability guarantees are rarer, and typically either rely on "light-tails" assu mptions or exhibit worse sample complexity. To address this issue, we show that a wide c lass of stochastic algorithms for strongly convex problems can be augmented with high confidence bounds at an overhead cost that is only logarithmic in the confidence level an d polylogarithmic in the condition number. We discuss consequences both for streamin g and offline algorithms. The procedure we propose, called proxBoost, is elementary and combines two well-known ingredients: robust distance estimation and the proximal point met hod.

artificial intelligence, machine learning, null 2, (13 more...)

arXiv.org Machine Learning

1907.13307

Country: North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Stochastic algorithms with geometric step decay converge linearly on sharp functions

Davis, Damek, Drusvyatskiy, Dmitriy, Charisopoulos, Vasileios

arXiv.org Machine LearningJul-22-2019

Stochastic (sub)gradient methods require step size schedule tuning to perform well in practice. Classical tuning strategies decay the step size polynomially and lead to optimal sublinear rates on (strongly) convex problems. An alternative schedule, popular in nonconvex optimization, is called \emph{geometric step decay} and proceeds by halving the step size after every few epochs. In recent work, geometric step decay was shown to improve exponentially upon classical sublinear rates for the class of \emph{sharp} convex functions. In this work, we ask whether geometric step decay similarly improves stochastic algorithms for the class of sharp nonconvex problems. Such losses feature in modern statistical recovery problems and lead to a new challenge not present in the convex setting: the region of convergence is local, so one must bound the probability of escape. Our main result shows that for a large class of stochastic, sharp, nonsmooth, and nonconvex problems a geometric step decay schedule endows well-known algorithms with a local linear rate of convergence to global minimizers. This guarantee applies to the stochastic projected subgradient, proximal point, and prox-linear algorithms. As an application of our main result, we analyze two statistical recovery tasks---phase retrieval and blind deconvolution---and match the best known guarantees under Gaussian measurement models and establish new guarantees under heavy-tailed distributions.

algorithm, artificial intelligence, optimization problem, (14 more...)

arXiv.org Machine Learning

1907.09547

Country: North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

A SMART Stochastic Algorithm for Nonconvex Optimization with Applications to Robust Machine Learning

Aravkin, Aleksandr, Davis, Damek

arXiv.org Machine LearningFeb-5-2017

In this paper, we show how to transform any optimization problem that arises from fitting a machine learning model into one that (1) detects and removes contaminated data from the training set while (2) simultaneously fitting the trimmed model on the uncontaminated data that remains. To solve the resulting nonconvex optimization problem, we introduce a fast stochastic proximal-gradient algorithm that incorporates prior knowledge through nonsmooth regularization. For datasets of size $n$, our approach requires $O(n^{2/3}/\varepsilon)$ gradient evaluations to reach $\varepsilon$-accuracy and, when a certain error bound holds, the complexity improves to $O(\kappa n^{2/3}\log(1/\varepsilon))$. These rates are $n^{1/3}$ times better than those achieved by typical, full gradient methods.

artificial intelligence, evaluation, machine learning, (16 more...)

arXiv.org Machine Learning

1610.01101

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback