Goto

Collaborating Authors

 time scale





A Recurrent Neural Circuit Mechanism of T emporal-scaling Equivariant Representation

Neural Information Processing Systems

Time perception is fundamental in our daily life. An important feature of time perception is temporal scaling (TS): the ability to generate temporal sequences (e.g., movements) with different speeds. However, it is largely unknown about the mathematical principle underlying TS in the brain.




high

Neural Information Processing Systems

We show it depends on the precise way in which the limit is taken, and in particular on how the quantityofdata,thehiddenlayerwidth,&thelearningratescalesasd .


Dichotomy of Feature Learning and Unlearning: Fast-Slow Analysis on Neural Networks with Stochastic Gradient Descent

Imai, Shota, Nishiyama, Sota, Imaizumi, Masaaki

arXiv.org Machine Learning

The dynamics of gradient-based training in neural networks often exhibit nontrivial structures; hence, understanding them remains a central challenge in theoretical machine learning. In particular, a concept of feature unlearning, in which a neural network progressively loses previously learned features over long training, has gained attention. In this study, we consider the infinite-width limit of a two-layer neural network updated with a large-batch stochastic gradient, then derive differential equations with different time scales, revealing the mechanism and conditions for feature unlearning to occur. Specifically, we utilize the fast-slow dynamics: while an alignment of first-layer weights develops rapidly, the second-layer weights develop slowly. The direction of a flow on a critical manifold, determined by the slow dynamics, decides whether feature unlearning occurs. We give numerical validation of the result, and derive theoretical grounding and scaling laws of the feature unlearning. Our results yield the following insights: (i) the strength of the primary nonlinear term in data induces the feature unlearning, and (ii) an initial scale of the second-layer weights mitigates the feature unlearning.


Creating Multi-Level Skill Hierarchies in Reinforcement Learning

Neural Information Processing Systems

What is a useful skill hierarchy for an autonomous agent? We propose an answer based on a graphical representation of how the interaction between an agent and its environment may unfold. Our approach uses modularity maximisation as a central organising principle to expose the structure of the interaction graph at multiple levels of abstraction. The result is a collection of skills that operate at varying time scales, organised into a hierarchy, where skills that operate over longer time scales are composed of skills that operate over shorter time scales. The entire skill hierarchy is generated automatically, with no human input, including the skills themselves (their behaviour, when they can be called, and when they terminate) as well as the dependency structure between them. In a wide range of environments, this approach generates skill hierarchies that are intuitively appealing and that considerably improve the learning performance of the agent.


DeepSITH: Efficient Learning via Decomposition of What and When Across Time Scales

Neural Information Processing Systems

Extracting temporal relationships over a range of scales is a hallmark ofhuman perception and cognition---and thus it is a critical feature of machinelearning applied to real-world problems. Neural networks are either plaguedby the exploding/vanishing gradient problem in recurrent neural networks(RNNs) or must adjust their parameters to learn the relevant time scales(e.g., in LSTMs).