curvature
Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature. We extend the framework of natural policy gradient and propose to optimize both the actor and the critic using Kronecker-factored approximate curvature (K-FAC) with trust region; hence we call our method Actor Critic using Kronecker-Factored Trust Region (ACKTR). To the best of our knowledge, this is the first scalable trust region natural gradient method for actor-critic methods. It is also the method that learns non-trivial tasks in continuous control as well as discrete control policies directly from raw pixel inputs. We tested our approach across discrete domains in Atari games as well as continuous domains in the MuJoCo environment. With the proposed methods, we are able to achieve higher rewards and a 2-to 3-fold improvement in sample efficiency on average, compared to previous state-of-the-art on-policy actor-critic methods.
The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network
An important factor contributing to the success of deep learning has been the remarkable ability to optimize large neural networks using simple first-order optimization algorithms like stochastic gradient descent. While the efficiency of such methods depends crucially on the local curvature of the loss surface, very little is actually known about how this geometry depends on network architecture and hyperparameters. In this work, we extend a recently-developed framework for studying spectra of nonlinear random matrices to characterize an important measure of curvature, namely the eigenvalues of the Fisher information matrix. We focus on a single-hidden-layer neural network with Gaussian data and weights and provide an exact expression for the spectrum in the limit of infinite width. We find that linear networks suffer worse conditioning than nonlinear networks and that nonlinear networks are generically non-degenerate. We also predict and demonstrate empirically that by adjusting the nonlinearity, the spectrum can be tuned so as to improve the efficiency of first-order optimization methods.
- North America > United States > New York (0.04)
- Europe > United Kingdom (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (5 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Pennsylvania (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (6 more...)
- North America > United States (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Fribourg > Fribourg (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.92)
- Europe > France > Provence-Alpes-Côte d'Azur (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Information Technology (0.67)
- Health & Medicine (0.46)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Data Science > Data Quality (0.94)
- (6 more...)
Fast Rates in Stochastic Online Convex Optimization by Exploiting the Curvature of Feasible Sets
In this work, we explore online convex optimization (OCO) and introduce a new condition and analysis that provides fast rates by exploiting the curvature of feasible sets. In online linear optimization, it is known that if the average gradient of loss functions exceeds a certain threshold, the curvature of feasible sets can be exploited by the follow-the-leader (FTL) algorithm to achieve a logarithmic regret. This study reveals that algorithms adaptive to the curvature of loss functions can also leverage the curvature of feasible sets. In particular, we first prove that if an optimal decision is on the boundary of a feasible set and the gradient of an underlying loss function is non-zero, then the algorithm achieves a regret bound of O ( ρ ln T) in stochastic environments. Here, ρ > 0 is the radius of the smallest sphere that includes the optimal decision and encloses the feasible set. Our approach, unlike existing ones, can work directly with convex loss functions, exploiting the curvature of loss functions simultaneously, and can achieve the logarithmic regret only with a local property of feasible sets. Additionally, the algorithm achieves an O ( T) regret even in adversarial environments, in which FTL suffers an Ω( T) regret, and achieves an O ( ρ ln T + Cρ ln T) regret in corrupted stochastic environments with corruption level C .
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
- North America > United States > California (0.04)
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)