escape time
Review for NeurIPS paper: Escaping the Gravitational Pull of Softmax
Summary and Contributions: ##Update## The rebuttal adequately addressed my main concerns and I am consequently increasing my score to a 7. In particular I was pleased that the authors investigated the issues with the learning rate, and I would be happy if they mention this potential limitation in their revisions, and include the experimental results showing that the naive adaptive learning rate proposals I made would not be effective. It was also pleasing that they will discuss and compare with Neural Replicator Dynamics, and the additional experiment with sampled actions also looks promising. The reason I didn't increase my score further was that the current set of experiments is still rather simple, and it is difficult for me to assess whether the new method is likely to be widely used. Though, I feel that the contribution may well turn out to be much more influential.
Tipping Points of Evolving Epidemiological Networks: Machine Learning-Assisted, Data-Driven Effective Modeling
Evangelou, Nikolaos, Cui, Tianqi, Bello-Rivas, Juan M., Makeev, Alexei, Kevrekidis, Ioannis G.
We study the tipping point collective dynamics of an adaptive susceptible-infected-susceptible (SIS) epidemiological network in a data-driven, machine learning-assisted manner. We identify a parameter-dependent effective stochastic differential equation (eSDE) in terms of physically meaningful coarse mean-field variables through a deep-learning ResNet architecture inspired by numerical stochastic integrators. We construct an approximate effective bifurcation diagram based on the identified drift term of the eSDE and contrast it with the mean-field SIS model bifurcation diagram. We observe a subcritical Hopf bifurcation in the evolving network's effective SIS dynamics, that causes the tipping point behavior; this takes the form of large amplitude collective oscillations that spontaneously -- yet rarely -- arise from the neighborhood of a (noisy) stationary state. We study the statistics of these rare events both through repeated brute force simulations and by using established mathematical/computational tools exploiting the right-hand-side of the identified SDE. We demonstrate that such a collective SDE can also be identified (and the rare events computations also performed) in terms of data-driven coarse observables, obtained here via manifold learning techniques, in particular Diffusion Maps. The workflow of our study is straightforwardly applicable to other complex dynamics problems exhibiting tipping point dynamics.
- North America > United States > Maryland > Baltimore (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report (0.50)
- Workflow (0.48)
- Health & Medicine (0.97)
- Energy (0.67)
- Government > Regional Government > North America Government > United States Government (0.46)
- (2 more...)
Revisiting the Noise Model of Stochastic Gradient Descent
Battash, Barak, Lindenbaum, Ofir
The stochastic gradient noise (SGN) is a significant factor in the success of stochastic gradient descent (SGD). Following the central limit theorem, SGN was initially modeled as Gaussian, and lately, it has been suggested that stochastic gradient noise is better characterized using $S\alpha S$ L\'evy distribution. This claim was allegedly refuted and rebounded to the previously suggested Gaussian noise model. This paper presents solid, detailed empirical evidence that SGN is heavy-tailed and better depicted by the $S\alpha S$ distribution. Furthermore, we argue that different parameters in a deep neural network (DNN) hold distinct SGN characteristics throughout training. To more accurately approximate the dynamics of SGD near a local minimum, we construct a novel framework in $\mathbb{R}^N$, based on L\'evy-driven stochastic differential equation (SDE), where one-dimensional L\'evy processes model each parameter in the DNN. Next, we show that SGN jump intensity (frequency and amplitude) depends on the learning rate decay mechanism (LRdecay); furthermore, we demonstrate empirically that the LRdecay effect may stem from the reduction of the SGN and not the decrease in the step size. Based on our analysis, we examine the mean escape time, trapping probability, and more properties of DNNs near local minima. Finally, we prove that the training process will likely exit from the basin in the direction of parameters with heavier tail SGN. We will share our code for reproducibility.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
Noise-induced degeneration in online learning
Sato, Yuzuru, Tsutsui, Daiji, Fujiwara, Akio
The gradient descent is the simplest optimisation algorithm represented by gradient dynamics in a potential. When the input data is finite, gradient descent dynamics fluctuates due to the finite size effects, and is called stochastic gradient descent. In this paper, we study stability of stochastic gradient descent dynamics from the viewpoint of dynamical systems theory. Learning is characterised as nonautonomous dynamics driven by uncertain input from the external, and as multi-scale dynamics which consists of slow memory dynamics and fast system dynamics. When the uncertain input sequences are modelled by stochastic processes, dynamics of learning is described by a random dynamical system. In contrast to the traditional Fokker-Planck approaches [5, 15], the random dynamical system approaches enable the study not only of stationary distributions and global statistics, but also of the pathwise structure of stochastic dynamics. Based on nonautonomous and random dynamical system theory, it is possible to analyse stability and bifurcation in machine learning.
A Diffusion Theory for Deep Learning Dynamics: Stochastic Gradient Descent Escapes From Sharp Minima Exponentially Fast
Xie, Zeke, Sato, Issei, Sugiyama, Masashi
Stochastic optimization algorithms, such as Stochastic Gradient Descent (SGD) and its variants, are mainstream methods for training deep networks in practice. However, the theoretical mechanism behind gradient noise still remains to be further investigated. Deep learning is known to find flat minima with a large neighboring region in parameter space from which each weight vector has similar small error. In this paper, we focus on a fundamental problem in deep learning, "How can deep learning usually find flat minima among so many minima?" To answer the question, we develop a density diffusion theory (DDT) for revealing the fundamental dynamical mechanism of SGD and deep learning. More specifically, we study how escape time from loss valleys to the outside of valleys depends on minima sharpness, gradient noise and hyperparameters. One of the most interesting findings is that stochastic gradient noise from SGD can help escape from sharp minima exponentially faster than flat minima, while white noise can only help escape from sharp minima polynomially faster than flat minima. We also find large-batch training requires exponentially many iterations to pass through sharp minima and find flat minima. We present direct empirical evidence supporting the proposed theoretical results.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Towards Resilient UAV: Escape Time in GPS Denied Environment with Sensor Drift
Yoon, Hyung-Jin, Wan, Wenbin, Kim, Hunmin, Hovakimyan, Naira, Sha, Lui, Voulgaris, Petros G.
This paper considers a resilient state estimation framework for unmanned aerial vehicles (UAVs) that integrates a Kalman filter-like state estimator and an attack detector. When an attack is detected, the state estimator uses only IMU signals as the GPS signals do not contain legitimate information. This limited sensor availability induces a sensor drift problem questioning the reliability of the sensor estimates. We propose a new resilience measure, escape time, as the safe time within which the estimation errors remain in a tolerable region with high probability. This paper analyzes the stability of the proposed resilient estimation framework and quantifies a lower bound for the escape time. Moreover, simulations of the UAV model demonstrate the performance of the proposed framework and provide analytical results.
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > Florida > Hillsborough County > Tampa (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
Adaptive Stochastic Gradient Langevin Dynamics: Taming Convergence and Saddle Point Escape Time
In this paper, we propose a new adaptive stochastic gradient Langevin dynamics (ASGLD) algorithmic framework and its two specialized versions, namely adaptive stochastic gradient (ASG) and adaptive gradient Langevin dynamics(AGLD), for non-convex optimization problems. All proposed algorithms can escape from saddle points with at most $O(\log d)$ iterations, which is nearly dimension-free. Further, we show that ASGLD and ASG converge to a local minimum with at most $O(\log d/\epsilon^4)$ iterations. Also, ASGLD with full gradients or ASGLD with a slowly linearly increasing batch size converge to a local minimum with iterations bounded by $O(\log d/\epsilon^2)$, which outperforms existing first-order methods.
- North America > United States > Iowa > Story County > Ames (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Romania > Sud-Est Development Region > Constanța County > Constanța (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)