Time-Scale Coupling Between States and Parameters in Recurrent Neural Networks
–arXiv.org Artificial Intelligence
--We study how gating mechanisms in recurrent neural networks (RNNs) implicitly induce adaptive learning-rate behavior, even when training is carried out with a fixed, global learning rate. This effect arises from the coupling between state-space time scales-parametrized by the gates-and parameter-space dynamics during gradient descent. By deriving exact Jacobians for leaky-integrator and gated RNNs, we obtain a first-order expansion that makes explicit how constant, scalar, and multi-dimensional gates reshape gradient propagation, modulate effective step sizes, and introduce anisotropy in parameter updates. These findings reveal that gates not only control information flow, but also act as data-driven preconditioners that adapt optimization trajectories in parameter space. Empirical simulations corroborate these claims: in several sequence tasks, we show that gates induce lag-dependent effective learning rates and directional concentration of gradient flow, with multi-gate models matching or exceeding the anisotropic structure produced by Adam. These results highlight that optimizer-driven and gate-driven adaptivity are complementary but not equivalent mechanisms. Overall, this work provides a unified dynamical systems perspective on how gating couples state evolution with parameter updates, explaining why gated architectures achieve robust trainability and stability in practice.
arXiv.org Artificial Intelligence
Aug-26-2025
- Country:
- Europe
- Middle East > Malta
- Eastern Region > Northern Harbour District > Msida (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Middle East > Malta
- North America > United States
- Georgia > Fulton County
- Atlanta (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Georgia > Fulton County
- Europe
- Genre:
- Research Report (0.82)
- Technology: