acceleration
Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization
Anderson mixing has been heuristically applied to reinforcement learning (RL) algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Despite its heuristic improvement of convergence, a rigorous mathematical justification for the benefits of Anderson mixing in RL has not yet been put forward. In this paper, we provide deeper insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms. Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor.
Acceleration through Optimistic No-Regret Dynamics
We consider the problem of minimizing a smooth convex function by reducing the optimization to computing the Nash equilibrium of a particular zero-sum convex-concave game. Zero-sum games can be solved using online learning dynamics, where a classical technique involves simulating two no-regret algorithms that play against each other and, after $T$ rounds, the average iterate is guaranteed to solve the original optimization problem with error decaying as $O(\log T/T)$. In this paper we show that the technique can be enhanced to a rate of $O(1/T^2)$ by extending recent work \cite{RS13,SALS15} that leverages \textit{optimistic learning} to speed up equilibrium computation. The resulting optimization algorithm derived from this analysis coincides \textit{exactly} with the well-known \NA \cite{N83a} method, and indeed the same story allows us to recover several variants of the Nesterov's algorithm via small tweaks. We are also able to establish the accelerated linear rate for a function which is both strongly-convex and smooth. This methodology unifies a number of different iterative optimization methods: we show that the \HB algorithm is precisely the non-optimistic variant of \NA, and recent prior work already established a similar perspective on \FW \cite{AW17,ALLW18}.
Direct Runge-Kutta Discretization Achieves Acceleration
We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method. When the function is smooth enough, we show that acceleration can be achieved by a stable discretization of this ODE using standard Runge-Kutta integrators. Specifically, we prove that under Lipschitz-gradient, convexity and order-$(s+2)$ differentiability assumptions, the sequence of iterates generated by discretizing the proposed second-order ODE converges to the optimal solution at a rate of $\mathcal{O}({N^{-2\frac{s}{s+1}}})$, where $s$ is the order of the Runge-Kutta numerical integrator. Furthermore, we introduce a new local flatness condition on the objective, under which rates even faster than $\mathcal{O}(N^{-2})$ can be achieved with low-order integrators and only gradient information. Notably, this flatness condition is satisfied by several standard loss functions used in machine learning. We provide numerical experiments that verify the theoretical rates predicted by our results.
Reviews: Variational PDEs for Acceleration on Manifolds and Application to Diffeomorphisms
Summary: The main contribution of this paper is the derivation of an "accelerated" gradient descent scheme for computing the stationary point of a potential function on diffeomorphisms, inspired by the variational formulation of Nesterov's accelerated gradient methods [1]. The authors first derive the continuous time and space analogy of the Bregmann Lagrangian [1] for diffeomorphisms, then apply the discretization to solve image registration problems, empirically showing faster/better convergence than gradient descent. Pros: The paper is well-written. The proposed scheme of solving diffeomorphic registration by discretizing a variational solution similar to [1] is a novel contribution, to the best of my knowledge. The authors also show strong empirical support of the proposed method vs. gradient descent.
- Europe > Switzerland > Zürich > Zürich (0.09)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.06)
- North America > United States > Michigan (0.06)
- (5 more...)
The Power of Physical Representations
Leibniz's (1984) An Introduction to a Secret Encyclopedia includes the following marginal note: Principle of Physical Certainty: Everything which men have experienced always and in many ways will still happen: for example that iron sinks in water (Leibniz 1984). In our daily lives, we routinely use this principle. Thus, we know that we can pull with a string but not push with it; that a flower pot dropped from our balcony falls to the ground and breaks; that when we place a container of water on fire, water might boil after a while and overflow the container. The origin of such knowledge is a matter of constant debate. It is clear that we learn a great deal about the physical world as we grow up.
- Information Technology (1.00)
- Education (0.93)
New Optimizations Improve Deep Learning Frameworks For CPUs
Since most of us need more than a "machine learning only" server, I'll focus on the reality of how Intel Xeon SP Platinum processors remain the best choice for servers, including servers needing to do machine learning as part of their workload. Here is a partial run down of key software for accelerating deep learning on Intel Xeon Platinum processor versions enough that the best performance advantage of GPUs is closer to 2X than to 100X. There is also a good article in Parallel Universe Magazine, Issue 28, starting on page 26, titled Solving Real-World Machine Learning Problems with Intel Data Analytics Acceleration Library. High-core count CPUs (the Intel Xeon Phi processors – in particular the upcoming "Knights Mill" version), and FPGAs (Intel Xeon processors coupled with Intel/Altera FPGAs), offer highly flexible options excellent price/performance and power efficiencies.
Ray Kurzweil's Four Big Insights for Predicting the Future
Self-driving cars, virtual reality games, bioprinting human organs, human gene editing, AI personalities, 3D printing in space, three billion people connected to the Internet…. To answer some of these questions, our team decided to dig into Ray Kurzweil's 2005 book The Singularity Is Near, in which Kurzweil describes the exponential growth of technologies like artificial intelligence, genetics, computers, nanotechnology and robotics. According to Kurzweil, Moore's Law (describing the exponential growth of integrated circuits) is just one example of the law of accelerating returns, but it is perhaps the most powerful. New technology growing exponentially tends to progress deceptively slowly at first, but then its progress shoots upward and very quickly becomes disruptive.
Ray Kurzweil's Four Big Insights for Predicting the Future
Self-driving cars, virtual reality games, bioprinting human organs, human gene editing, AI personalities, 3D printing in space, three billion people connected to the Internet…. These incredible technological feats are all part of our world today. And while they are not evenly distributed, they are rapidly spreading and evolving -- and in the process radically changing nearly every aspect of modern life. How we eat, work, play, communicate, and travel are deeply affected by the development of new technology. But what is the underlying engine that drives technological progress?