Stochastic Cubic Regularization for Fast Nonconvex Optimization
Tripuraneni, Nilesh, Stern, Mitchell, Jin, Chi, Regier, Jeffrey, Jordan, Michael I.
In this setting, we only have access to the stochastic function f(x; ξ), where the random variable ξ is sampled from an underlying distribution D. The task is to optimize the expected function f(x), which in general may be nonconvex. This framework covers a wide range of problems, including the offline setting where we minimize the empirical loss over a fixed amount of data, and the online setting where data arrives sequentially. One of the most prominent applications of stochastic optimization has been in large-scale statistics and machine learning problems, such as the optimization of deep neural networks. Classical analysis in nonconvex optimization only guarantees convergence to a first-order stationary point (i.e., a point x satisfying ‖ f(x)‖ 0), which can be a local minimum, a local maximum, or a saddle point. This paper goes further, proposing an algorithm that escapes saddle points and converges to a local minimum.
Dec-5-2017
- Country:
- North America > United States
- California > Alameda County > Berkeley (0.04)
- Asia > Middle East
- Jordan (0.04)
- Africa > Middle East
- Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- North America > United States
- Genre:
- Research Report (0.82)
- Industry:
- Education (0.48)
- Technology: