AITopics | Doikov, Nikita

Collaborating Authors

Doikov, Nikita

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Stochastic Cubic Newton with Momentum

Chayti, El Mahdi, Doikov, Nikita, Jaggi, Martin

arXiv.org Artificial IntelligenceOct-25-2024

We study stochastic second-order methods for solving general non-convex optimization problems. We propose using a special version of momentum to stabilize the stochastic gradient and Hessian estimates in Newton's method. We show that momentum provably improves the variance of stochastic estimates and allows the method to converge for any noise level. Using the cubic regularization technique, we prove a global convergence rate for our method on general non-convex problems to a second-order stationary point, even when using only a single stochastic data sample per iteration. This starkly contrasts with all existing stochastic second-order methods for non-convex problems, which typically require large batches. Therefore, we are the first to demonstrate global convergence for batches of arbitrary size in the non-convex case for the Stochastic Cubic Newton. Additionally, we show improved speed on convex stochastic problems for our regularized Newton methods with momentum.

artificial intelligence, machine learning, momentum, (17 more...)

arXiv.org Artificial Intelligence

2410.19644

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Cubic regularized subspace Newton for non-convex optimization

Zhao, Jim, Lucchi, Aurelien, Doikov, Nikita

arXiv.org Artificial IntelligenceJun-24-2024

This paper addresses the optimization problem of minimizing non-convex continuous functions, which is relevant in the context of high-dimensional machine learning applications characterized by over-parametrization. We analyze a randomized coordinate second-order method named SSCN which can be interpreted as applying cubic regularization in random subspaces. This approach effectively reduces the computational complexity associated with utilizing second-order information, rendering it applicable in higher-dimensional scenarios. Theoretically, we establish convergence guarantees for non-convex functions, with interpolating rates for arbitrary subspace sizes and allowing inexact curvature estimation. When increasing subspace size, our complexity matches $\mathcal{O}(\epsilon^{-3/2})$ of the cubic regularization (CR) rate. Additionally, we propose an adaptive sampling scheme ensuring exact convergence rate of $\mathcal{O}(\epsilon^{-3/2}, \epsilon^{-3})$ to a second-order stationary point, even without sampling all coordinates. Experimental results demonstrate substantial speed-ups achieved by SSCN compared to conventional first-order methods.

artificial intelligence, machine learning, sscn, (18 more...)

arXiv.org Artificial Intelligence

2406.16666

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods

Chayti, El Mahdi, Doikov, Nikita, Jaggi, Martin

arXiv.org Artificial IntelligenceSep-6-2023

We study stochastic Cubic Newton methods for solving general possibly non-convex minimization problems. We propose a new framework, which we call the helper framework, that provides a unified view of the stochastic and variance-reduced second-order algorithms equipped with global complexity guarantees. It can also be applied to learning with auxiliary information. Our helper framework offers the algorithm designer high flexibility for constructing and analyzing the stochastic Cubic Newton methods, allowing arbitrary size batches, and the use of noisy and possibly biased estimates of the gradients and Hessians, incorporating both the variance reduction and the lazy Hessian updates. We recover the best-known complexities for the stochastic and variance-reduced Cubic Newton, under weak assumptions on the noise. A direct consequence of our theory is the new lazy stochastic second-order method, which significantly improves the arithmetic complexity for large dimension problems. We also establish complexity bounds for the classes of gradient-dominated objectives, that include convex and strongly convex problems. For Auxiliary Learning, we show that using a helper (auxiliary function) can outperform training alone if a given similarity measure is small.

artificial intelligence, hessian, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2302.11962

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

First and zeroth-order implementations of the regularized Newton method with lazy approximated Hessians

Doikov, Nikita, Grapiglia, Geovani Nunes

arXiv.org Artificial IntelligenceSep-5-2023

In this work, we develop first-order (Hessian-free) and zero-order (derivative-free) implementations of the Cubically regularized Newton method for solving general non-convex optimization problems. For that, we employ finite difference approximations of the derivatives. We use a special adaptive search procedure in our algorithms, which simultaneously fits both the regularization constant and the parameters of the finite difference approximations. It makes our schemes free from the need to know the actual Lipschitz constants. Additionally, we equip our algorithms with the lazy Hessian update that reuse a previously computed Hessian approximation matrix for several iterations. Specifically, we prove the global complexity bound of $\mathcal{O}( n^{1/2} \epsilon^{-3/2})$ function and gradient evaluations for our new Hessian-free method, and a bound of $\mathcal{O}( n^{3/2} \epsilon^{-3/2} )$ function evaluations for the derivative-free method, where $n$ is the dimension of the problem and $\epsilon$ is the desired accuracy for the gradient norm. These complexity bounds significantly improve the previously known ones in terms of the joint dependence on $n$ and $\epsilon$, for the first-order and zeroth-order non-convex optimization.

approximation, artificial intelligence, complexity, (16 more...)

arXiv.org Artificial Intelligence

2309.02412

Country:

North America > United States (0.14)
Europe > Switzerland (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.71)

Add feedback

Shuffle SGD is Always Better than SGD: Improved Analysis of SGD with Arbitrary Data Orders

Koloskova, Anastasia, Doikov, Nikita, Stich, Sebastian U., Jaggi, Martin

arXiv.org Artificial IntelligenceAug-8-2023

Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing neural networks, with Random Reshuffling (RR) and Single Shuffle (SS) being popular choices for cycling through random or single permutations of the training data. However, the convergence properties of these algorithms in the non-convex case are not fully understood. Existing results suggest that, in realistic training scenarios where the number of epochs is smaller than the training set size, RR may perform worse than SGD. In this paper, we analyze a general SGD algorithm that allows for arbitrary data orderings and show improved convergence rates for non-convex functions. Specifically, our analysis reveals that SGD with random and single shuffling is always faster or at least as good as classical SGD with replacement, regardless of the number of iterations. Overall, our study highlights the benefits of using SGD with random/single shuffling and provides new insights into its convergence properties for non-convex optimization.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2305.19259

Country:

Europe (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Linearization Algorithms for Fully Composite Optimization

Vladarean, Maria-Luiza, Doikov, Nikita, Jaggi, Martin, Flammarion, Nicolas

arXiv.org Artificial IntelligenceJul-12-2023

This paper studies first-order algorithms for solving fully composite optimization problems over convex and compact sets. We leverage the structure of the objective by handling its differentiable and non-differentiable components separately, linearizing only the smooth parts. This provides us with new generalizations of the classical Frank-Wolfe method and the Conditional Gradient Sliding algorithm, that cater to a subclass of non-differentiable problems. Our algorithms rely on a stronger version of the linear minimization oracle, which can be efficiently implemented in several practical applications. We provide the basic version of our method with an affine-invariant analysis and prove global convergence rates for both convex and non-convex objectives. Furthermore, in the convex case, we propose an accelerated method with correspondingly improved complexity. Finally, we provide illustrative experiments to support our theoretical results.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2302.12808

Country: Europe (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Second-order optimization with lazy Hessians

Doikov, Nikita, Chayti, El Mahdi, Jaggi, Martin

arXiv.org Artificial IntelligenceJun-15-2023

We analyze Newton's method with lazy Hessian updates for solving general possibly non-convex optimization problems. We propose to reuse a previously seen Hessian for several iterations while computing new gradients at each step of the method. This significantly reduces the overall arithmetical complexity of second-order optimization schemes. By using the cubic regularization technique, we establish fast global convergence of our method to a second-order stationary point, while the Hessian does not need to be updated each iteration. For convex problems, we justify global and local superlinear rates for lazy Newton steps with quadratic regularization, which is easier to compute. The optimal frequency for updating the Hessian is once every $d$ iterations, where $d$ is the dimension of the problem. This provably improves the total arithmetical complexity of second-order algorithms by a factor $\sqrt{d}$.

artificial intelligence, hessian, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2212.00781

Country: North America > United States > Hawaii (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Polynomial Preconditioning for Gradient Methods

Doikov, Nikita, Rodomanov, Anton

arXiv.org Artificial IntelligenceJan-30-2023

We study first-order methods with preconditioning for solving structured nonlinear convex optimization problems. We propose a new family of preconditioners generated by symmetric polynomials. They provide first-order optimization methods with a provable improvement of the condition number, cutting the gaps between highest eigenvalues, without explicit knowledge of the actual spectrum. We give a stochastic interpretation of this preconditioning in terms of coordinate volume sampling and compare it with other classical approaches, including the Chebyshev polynomials. We show how to incorporate a polynomial preconditioning into the Gradient and Fast Gradient Methods and establish the corresponding global complexity bounds. Finally, we propose a simple adaptive search procedure that automatically chooses the best possible polynomial preconditioning for the Gradient Method, minimizing the objective along a low-dimensional Krylov subspace. Numerical experiments confirm the efficiency of our preconditioning strategies for solving various machine learning problems.

artificial intelligence, krylov, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2301.13194

Country: Europe (0.28)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Add feedback