separatrix
Large-Step Training Dynamics of a Two-Factor Linear Transformer Model
Gradient-flow analyses show that simplified linear transformers can learn the in-context linear-regression algorithm, but they do not explain the finite-step behavior of gradient descent at large learning rates. Motivated by empirical work on high-learning-rate transformer instabilities and by the cubic-map phase diagram for quadratic regression, we study an exactly reducible one-prompt linear-transformer training problem. After normalization, the dynamics reduce to a two-factor product map with an effective step-size parameter \(μ\). On the balanced slice, this map recovers the known scalar cubic transition from monotone convergence to catapult convergence, periodic and chaotic bounded nonconvergence, and divergence. We then analyze the full two-dimensional system and show that, for \(0<μ<2\), it has an explicit invariant Chebyshev ellipse separating forward-invariant regions; this ellipse carries off-balanced chaotic dynamics but is transversely repelling, while balanced scalar attractors can be transversely attracting. These results show that large constant learning rates can change the training attractor of the learned transformer rather than merely accelerating convergence: beyond sharp stability thresholds, finite-step training may settle into cycles, bounded chaos, or divergence instead of a single in-context linear-regression solution. We also discuss the consequences for mini-batch gradient descent based training methods.
Finding separatrices of dynamical flows with Deep Koopman Eigenfunctions
Dabholkar, Kabir V., Barak, Omri
Many natural systems, including neural circuits involved in decision making, are modeled as high-dimensional dynamical systems with multiple stable states. While existing analytical tools primarily describe behavior near stable equilibria, characterizing separatrices--the manifolds that delineate boundaries between different basins of attraction--remains challenging, particularly in high-dimensional settings. Here, we introduce a numerical framework leveraging Koopman Theory combined with Deep Neural Networks to effectively characterize separatrices. Specifically, we approximate Koopman Eigenfunctions (KEFs) associated with real positive eigenvalues, which vanish precisely at the separatrices. Utilizing these scalar KEFs, optimization methods efficiently locate separatrices even in complex systems. We demonstrate our approach on synthetic benchmarks, ecological network models, and high-dimensional recurrent neural networks trained on either neuroscience-inspired tasks or fit to real neural data. Moreover, we illustrate the practical utility of our method by designing optimal perturbations that can shift systems across separatrices, enabling predictions relevant to optogenetic stimulation experiments in neuroscience.
Computing non-equilibrium trajectories by a deep learning approach
Predicting the occurence of rare and extreme events in complex systems is a well-known problem in non-equilibrium physics. These events can have huge impacts on human societies. New approaches have emerged in the last ten years, which better estimate tail distributions. They often use large deviation concepts without the need to perform heavy direct ensemble simulations. In particular, a well-known approach is to derive a minimum action principle and to find its minimizers. The analysis of rare reactive events in non-equilibrium systems without detailed balance is notoriously difficult either theoretically and computationally. They are described in the limit of small noise by the Freidlin-Wentzell action. We propose here a new method which minimizes the geometrical action instead using neural networks: it is called deep gMAM. It relies on a natural and simple machine-learning formulation of the classical gMAM approach. We give a detailed description of the method as well as many examples. These include bimodal switches in complex stochastic (partial) differential equations, quasi-potential estimates, and extreme events in Burgers turbulence.