Collaborating Authors

recurrent network

VOWEL: A Local Online Learning Rule for Recurrent Networks of Probabilistic Spiking Winner-Take-All Circuits Machine Learning

Networks of spiking neurons and Winner-Take-All spiking circuits (WTA-SNNs) can detect information encoded in spatio-temporal multi-valued events. These are described by the timing of events of interest, e.g., clicks, as well as by categorical numerical values assigned to each event, e.g., like or dislike. Other use cases include object recognition from data collected by neuromorphic cameras, which produce, for each pixel, signed bits at the times of sufficiently large brightness variations. Existing schemes for training WTA-SNNs are limited to rate-encoding solutions, and are hence able to detect only spatial patterns. Developing more general training algorithms for arbitrary WTA-SNNs inherits the challenges of training (binary) Spiking Neural Networks (SNNs). These amount, most notably, to the non-differentiability of threshold functions, to the recurrent behavior of spiking neural models, and to the difficulty of implementing backpropagation in neuromorphic hardware. In this paper, we develop a variational online local training rule for WTA-SNNs, referred to as VOWEL, that leverages only local pre- and post-synaptic information for visible circuits, and an additional common reward signal for hidden circuits. The method is based on probabilistic generalized linear neural models, control variates, and variational regularization. Experimental results on real-world neuromorphic datasets with multi-valued events demonstrate the advantages of WTA-SNNs over conventional binary SNNs trained with state-of-the-art methods, especially in the presence of limited computing resources.

Synonymous Generalization in Sequence-to-Sequence Recurrent Networks Artificial Intelligence

When learning a language, people can quickly expand their understanding of the unknown content by using compositional skills, such as from two words "go" and "fast" to a new phrase "go fast." In recent work of Lake and Baroni (2017), modern Sequence-to-Sequence(seq2seq) Recurrent Neural Networks (RNNs) can make powerful zero-shot generalizations in specifically controlled experiments. However, there is a missing regarding the property of such strong generalization and its precise requirements. This paper explores this positive result in detail and defines this pattern as the synonymous generalization, an ability to recognize an unknown sequence by decomposing the difference between it and a known sequence as corresponding existing synonyms. To better investigate it, I introduce a new environment called Colorful Extended Cleanup World (CECW), which consists of complex commands paired with logical expressions. While demonstrating that sequential RNNs can perform synonymous generalizations on foreign commands, I conclude their prerequisites for success. I also propose a data augmentation method, which is successfully verified on the Geoquery (GEO) dataset, as a novel application of synonymous generalization for real cases.

Depth Enables Long-Term Memory for Recurrent Neural Networks Machine Learning

A key attribute that drives the unprecedented success of modern Recurrent Neural Networks (RNNs) on learning tasks which involve sequential data, is their ability to model intricate long-term temporal dependencies. However, a well established measure of RNNs long-term memory capacity is lacking, and thus formal understanding of the effect of depth on their ability to correlate data throughout time is limited. Specifically, existing depth efficiency results on convolutional networks do not suffice in order to account for the success of deep RNNs on data of varying lengths. In order to address this, we introduce a measure of the network's ability to support information flow across time, referred to as the Start-End separation rank, which reflects the distance of the function realized by the recurrent network from modeling no dependency between the beginning and end of the input sequence. We prove that deep recurrent networks support Start-End separation ranks which are combinatorially higher than those supported by their shallow counterparts. Thus, we establish that depth brings forth an overwhelming advantage in the ability of recurrent networks to model long-term dependencies, and provide an exemplar of quantifying this key attribute. We empirically demonstrate the discussed phenomena on common RNNs through extensive experimental evaluation using the optimization technique of restricting the hidden-to-hidden matrix to being orthogonal. Finally, we employ the tool of quantum Tensor Networks to gain additional graphic insights regarding the complexity brought forth by depth in recurrent networks.

Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics

Neural Information Processing Systems

Recurrent neural networks (RNNs) are a widely used tool for modeling sequential data, yet they are often treated as inscrutable black boxes. Given a trained recurrent network, we would like to reverse engineer it--to obtain a quantitative, interpretable description of how it solves a particular task. Even for simple tasks, a detailed understanding of how recurrent networks work, or a prescription for how to develop such an understanding, remains elusive. In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task. Given a trained network, we find fixed points of the recurrent dynamics and linearize the nonlinear system around these fixed points.

Universality and individuality in neural dynamics across large populations of recurrent networks

Neural Information Processing Systems

Many recent studies have employed task-based modeling with recurrent neural networks (RNNs) to infer the computational function of different brain regions. These models are often assessed by quantitatively comparing the low-dimensional neural dynamics of the model and the brain, for example using canonical correlation analysis (CCA). However, the nature of the detailed neurobiological inferences one can draw from such efforts remains elusive. For example, to what extent does training neural networks to solve simple tasks, prevalent in neuroscientific studies, uniquely determine the low-dimensional dynamics independent of neural architectures? Or alternatively, are the learned dynamics highly sensitive to different neural architectures?

Training recurrent networks to generate hypotheses about how the brain solves hard navigation problems

Neural Information Processing Systems

Self-localization during navigation with noisy sensors in an ambiguous world is computationally challenging, yet animals and humans excel at it. In robotics, {\em Simultaneous Location and Mapping} (SLAM) algorithms solve this problem through joint sequential probabilistic inference of their own coordinates and those of external spatial landmarks. We generate the first neural solution to the SLAM problem by training recurrent LSTM networks to perform a set of hard 2D navigation tasks that require generalization to completely novel trajectories and environments. Our goal is to make sense of how the diverse phenomenology in the brain's spatial navigation circuits is related to their function. We show that the hidden unit representations exhibit several key properties of hippocampal place cells, including stable tuning curves that remap between environments.

Recurrently Controlled Recurrent Networks

Neural Information Processing Systems

Recurrent neural networks (RNNs) such as long short-term memory and gated recurrent units are pivotal building blocks across a broad spectrum of sequence modeling problems. This paper proposes a recurrently controlled recurrent network (RCRN) for expressive and powerful sequence encoding. More concretely, the key idea behind our approach is to learn the recurrent gating functions using recurrent networks. Our architecture is split into two components - a controller cell and a listener cell whereby the recurrent controller actively influences the compositionality of the listener cell. We conduct extensive experiments on a myriad of tasks in the NLP domain such as sentiment analysis (SST, IMDb, Amazon reviews, etc.), question classification (TREC), entailment classification (SNLI, SciTail), answer selection (WikiQA, TrecQA) and reading comprehension (NarrativeQA).

Recurrent networks of coupled Winner-Take-All oscillators for solving constraint satisfaction problems

Neural Information Processing Systems

We present a recurrent neuronal network, modeled as a continuous-time dynamical system, that can solve constraint satisfaction problems. Discrete variables are represented by coupled Winner-Take-All (WTA) networks, and their values are encoded in localized patterns of oscillations that are learned by the recurrent weights in these networks. Constraints over the variables are encoded in the network connectivity. Although there are no sources of noise, the network can escape from local optima in its search for solutions that satisfy all constraints by modifying the effective network connectivity through oscillations. If there is no solution that satisfies all constraints, the network state changes in a pseudo-random manner and its trajectory approximates a sampling procedure that selects a variable assignment with a probability that increases with the fraction of constraints satisfied by this assignment.

Training and Analysing Deep Recurrent Neural Networks

Neural Information Processing Systems

Time series often have a temporal hierarchy, with information that is spread out over multiple time scales. Common recurrent neural networks, however, do not explicitly accommodate such a hierarchy, and most research on them has been focusing on training algorithms rather than on their basic architecture. In this pa- per we study the effect of a hierarchy of recurrent neural networks on processing time series. Here, each layer is a recurrent network which receives the hidden state of the previous layer as input. This architecture allows us to perform hi- erarchical processing on difficult temporal tasks, and more naturally capture the structure of time series.

Enforcing balance allows local supervised learning in spiking recurrent networks

Neural Information Processing Systems

To predict sensory inputs or control motor trajectories, the brain must constantly learn temporal dynamics based on error feedback. However, it remains unclear how such supervised learning is implemented in biological neural networks. Learning in recurrent spiking networks is notoriously difficult because local changes in connectivity may have an unpredictable effect on the global dynamics. The most commonly used learning rules, such as temporal back-propagation, are not local and thus not biologically plausible. Furthermore, reproducing the Poisson-like statistics of neural responses requires the use of networks with balanced excitation and inhibition.