AITopics | new q-newton

Collaborating Authors

new q-newton

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

New Q-Newton's method meets Backtracking line search: good convergence guarantee, saddle points avoidance, quadratic rate of convergence, and easy implementation

Truong, Tuyen Trung

arXiv.org Machine LearningAug-23-2021

In a recent joint work, the author has developed a modification of Newton's method, named New Q-Newton's method, which can avoid saddle points and has quadratic rate of convergence. While good theoretical convergence guarantee has not been established for this method, experiments on small scale problems show that the method works very competitively against other well known modifications of Newton's method such as Adaptive Cubic Regularization and BFGS, as well as first order methods such as Unbounded Two-way Backtracking Gradient Descent. In this paper, we resolve the convergence guarantee issue by proposing a modification of New Q-Newton's method, named New Q-Newton's method Backtracking, which incorporates a more sophisticated use of hyperparameters and a Backtracking line search. This new method has very good theoretical guarantees, which for a {\bf Morse function} yields the following (which is unknown for New Q-Newton's method): {\bf Theorem.} Let $f:\mathbb{R}^m\rightarrow \mathbb{R}$ be a Morse function, that is all its critical points have invertible Hessian. Then for a sequence $\{x_n\}$ constructed by New Q-Newton's method Backtracking from a random initial point $x_0$, we have the following two alternatives: i) $\lim _{n\rightarrow\infty}||x_n||=\infty$, or ii) $\{x_n\}$ converges to a point $x_{\infty}$ which is a {\bf local minimum} of $f$, and the rate of convergence is {\bf quadratic}. Moreover, if $f$ has compact sublevels, then only case ii) happens. As far as we know, for Morse functions, this is the best theoretical guarantee for iterative optimization algorithms so far in the literature. We have tested in experiments on small scale, with some further simplified versions of New Q-Newton's method Backtracking, and found that the new method significantly improve New Q-Newton's method.

backtracking, method backtracking, new q-newton, (12 more...)

arXiv.org Machine Learning

2108.10249

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
Europe > Germany (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Unconstrained optimisation on Riemannian manifolds

Truong, Tuyen Trung

arXiv.org Machine LearningAug-31-2020

In this paper, we give explicit descriptions of versions of (Local-) Backtracking Gradient Descent and New Q-Newton's method to the Riemannian setting.Here are some easy to state consequences of results in this paper, where X is a general Riemannian manifold of finite dimension and $f:X\rightarrow \mathbb{R}$ a $C^2$ function which is Morse (that is, all its critical points are non-degenerate). {\bf Theorem.} For random choices of the hyperparameters in the Riemanian Local Backtracking Gradient Descent algorithm and for random choices of the initial point $x_0$, the sequence $\{x_n\}$ constructed by the algorithm either (i) converges to a local minimum of $f$ or (ii) eventually leaves every compact subsets of $X$ (in other words, diverges to infinity on $X$). If $f$ has compact sublevels, then only the former alternative happens. The convergence rate is the same as in the classical paper by Armijo. {\bf Theorem.} Assume that $f$ is $C^3$. For random choices of the hyperparametes in the Riemannian New Q-Newton's method, if the sequence constructed by the algorithm converges, then the limit is a critical point of $f$. We have a local Stable-Center manifold theorem, near saddle points of $f$, for the dynamical system associated to the algorithm. If the limit point is a non-degenerate minimum point, then the rate of convergence is quadratic. If moreover $X$ is an open subset of a Lie group and the initial point $x_0$ is chosen randomly, then we can globally avoid saddle points. As an application, we propose a general method using Riemannian Backtracking GD to find minimum of a function on a bounded ball in a Euclidean space, and do explicit calculations for calculating the smallest eigenvalue of a symmetric square matrix.

artificial intelligence, backtracking gd, machine learning, (16 more...)

arXiv.org Machine Learning

2008.11091

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

A modification of quasi-Newton's methods helping to avoid saddle points

Truong, Tuyen Trung, To, Tat Dat, Nguyen, Tuan Hang, Nguyen, Thu Hang, Nguyen, Hoang Phuong, Helmy, Maged

arXiv.org Machine LearningJun-2-2020

We recall that if $A$ is an invertible and symmetric real $m\times m$ matrix, then it is diagonalisable. Therefore, if we denote by $\mathcal{E}^{+}(A)\subset \mathbb{R}^m$ (respectively $\mathcal{E}^{-}(A)\subset \mathbb{R}^m$) to be the vector subspace generated by eigenvectors with positive eigenvalues of $A$ (correspondingly the vector subspace generated by eigenvectors with negative eigenvalues of $A$), then we have an orthogonal decomposition $\mathbb{R}^m=\mathcal{E}^{+}(A)\oplus \mathcal{E}^{-}(A)$. Hence, every $x\in \mathbb{R}^m$ can be written uniquely as $x=pr_{A,+}(x)+pr_{A,-}(x)$ with $pr_{A,+}(x)\in \mathcal{E}^{+}(A)$ and $pr_{A,-}(x)\in \mathcal{E}^{-}(A)$. We propose the following simple new modification of quasi-Newton's methods. {\bf New Q-Newton's method.} Let $\Delta =\{\delta _0,\delta _1,\delta _2,\ldots \}$ be a countable set of real numbers which has at least $m+1$ elements. Let $f:\mathbb{R}^m\rightarrow \mathbb{R}$ be a $C^2$ function. Let $\alpha >0$. For each $x\in \mathbb{R}^m$ such that $\nabla f(x)\not=0$, let $\delta (x)=\delta _j$, where $j$ is the smallest number so that $\nabla ^2f(x)+\delta _j||\nabla f(x)||^{1+\alpha}Id$ is invertible. (If $\nabla f(x)=0$, then we choose $\delta (x)=\delta _0$.) Let $x_0\in \mathbb{R}^m$ be an initial point. We define a sequence of $x_n\in \mathbb{R}^m$ and invertible and symmetric $m\times m$ matrices $A_n$ as follows: $A_n=\nabla ^2f(x_n)+\delta (x_n) ||\nabla f(x_n)||^{1+\alpha}Id$ and $x_{n+1}=x_n-w_n$, where $w_n=pr_{A_n,+}(v_n)-pr_{A_n,-}(v_n)$ and $v_n=A_n^{-1}\nabla f(x_n)$. The main result of this paper roughly says that if $f$ is $C^3$ and a sequence $\{x_n\}$, constructed by the New Q-Newton's method from a random initial point $x_0$, {\bf converges}, then the limit point is not a saddle point, and the convergence rate is the same as that of Newton's method.

artificial intelligence, machine learning, optimization problem, (19 more...)

arXiv.org Machine Learning

2006.01512

Country:

Europe > Norway > Eastern Norway > Oslo (0.05)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback