grus
Universal In-Context Approximation By Prompting Fully Recurrent Models
Zero-shot and in-context learning enable solving tasks without model fine-tuning, making them essential for developing generative model solutions. Therefore, it is crucial to understand whether a pretrained model can be prompted to approximate any function, i.e., whether it is a universal in-context approximator. While it was recently shown that transformer models do possess this property, these results rely on their attention mechanism. Hence, these findings do not apply to fully recurrent architectures like RNNs, LSTMs, and the increasingly popular SSMs. We demonstrate that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures such as Mamba and Hawk/Griffin can also serve be universal in-context approximators. To streamline our argument, we introduce a programming language called LSRL that compiles to these fully recurrent architectures. LSRL may be of independent interest for further studies of fully recurrent models, such as constructing interpretability benchmarks. We also study the role of multiplicative gating and observe that architectures incorporating such gating (e.g., LSTMs, GRUs, Hawk/Griffin) can implement certain operations more stably, making them more viable candidates for practical in-context universal approximation.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)
- (2 more...)
- Media > Music (0.47)
- Leisure & Entertainment (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Convolutional Spiking-based GRU Cell for Spatio-temporal Data
Abdennadher, Yesmine, Cicciarella, Eleonora, Rossi, Michele
Spike-based temporal messaging enables SNNs to efficiently process both purely temporal and spatio-temporal time-series or event-driven data. Combining SNNs with Gated Recurrent Units (GRUs), a variant of recurrent neural networks, gives rise to a robust framework for sequential data processing; however, traditional RNNs often lose local details when handling long sequences. Previous approaches, such as SpikGRU, fail to capture fine-grained local dependencies in event-based spatio-temporal data. In this paper, we introduce the Convolutional Spiking GRU (CS-GRU) cell, which leverages convolutional operations to preserve local structure and dependencies while integrating the temporal precision of spiking neurons with the efficient gating mechanisms of GRUs. This versatile architecture excels on both temporal datasets (NTIDIGITS, SHD) and spatio-temporal benchmarks (MNIST, DVSGesture, CIFAR10DVS). Our experiments show that CS-GRU outperforms state-of-the-art GRU variants by an average of 4.35%, achieving over 90% accuracy on sequential tasks and up to 99.31% on MNIST. It is worth noting that our solution achieves 69% higher efficiency compared to SpikGRU. The code is available at: https://github.com/YesmineAbdennadher/CS-GRU.
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.40)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.40)
- North America > Cuba (0.06)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Universal In-Context Approximation By Prompting Fully Recurrent Models
Zero-shot and in-context learning enable solving tasks without model fine-tuning, making them essential for developing generative model solutions. Therefore, it is crucial to understand whether a pretrained model can be prompted to approximate any function, i.e., whether it is a universal in-context approximator. While it was recently shown that transformer models do possess this property, these results rely on their attention mechanism. Hence, these findings do not apply to fully recurrent architectures like RNNs, LSTMs, and the increasingly popular SSMs. We demonstrate that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures such as Mamba and Hawk/Griffin can also serve be universal in-context approximators.
Reviews: Preventing Gradient Explosions in Gated Recurrent Units
Summary The authors propose a method for optimizing GRU networks which aims to prevent exploding gradients. They motivate the method by showing that a constraint on the spectral norm of the state-to-state matrix keeps the dynamics of the network stable near the fixed point 0. The method is evaluated on language modelling and a music prediction task and leads to stable training in comparison to weight clipping. Technical quality The motivation of the method is well developed and it is nice that the method is evaluated on two different real-world datasets. However, one important issue I have with the evaluation is that the learning rate is not controlled for in the experiments. Unfortunately, this makes it hard to draw strong conclusions from the results.
Preventing Gradient Explosions in Gated Recurrent Units
Sekitoshi Kanai, Yasuhiro Fujiwara, Sotetsu Iwamura
A gated recurrent unit (GRU) is a successful recurrent neural network architecture for time-series data. The GRU is typically trained using a gradient-based method, which is subject to the exploding gradient problem in which the gradient increases significantly. This problem is caused by an abrupt change in the dynamics of the GRU due to a small variation in the parameters. In this paper, we find a condition under which the dynamics of the GRU changes drastically and propose a learning method to address the exploding gradient problem. Our method constrains the dynamics of the GRU so that it does not drastically change. We evaluated our method in experiments on language modeling and polyphonic music modeling. Our experiments showed that our method can prevent the exploding gradient problem and improve modeling accuracy.
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Europe > Czechia > South Moravian Region > Brno (0.04)
- (2 more...)
- Media > Music (0.87)
- Leisure & Entertainment (0.87)
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Csordás, Róbert, Potts, Christopher, Manning, Christopher D., Geiger, Atticus
The Linear Representation Hypothesis (LRH) states that neural networks learn to encode concepts as directions in activation space, and a strong version of the LRH states that models learn only such encodings. In this paper, we present a counterexample to this strong LRH: when trained to repeat an input token sequence, gated recurrent neural networks (RNNs) learn to represent the token at each position with a particular order of magnitude, rather than a direction. These representations have layered features that are impossible to locate in distinct linear subspaces. To show this, we train interventions to predict and manipulate tokens by learning the scaling factor corresponding to each sequence position. These interventions indicate that the smallest RNNs find only this magnitude-based solution, while larger RNNs have linear representations. These findings strongly indicate that interpretability research should not be confined by the LRH.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > France (0.04)
- (7 more...)
Unveiling Emotions from EEG: A GRU-Based Approach
Johari, Sarthak, Meedinti, Gowri Namratha, Delhibabu, Radhakrishnan, Joshi, Deepak
One of the most important study areas in affective computing is emotion identification using EEG data. In this study, the Gated Recurrent Unit (GRU) algorithm, which is a type of Recurrent Neural Networks (RNNs), is tested to see if it can use EEG signals to predict emotional states. Our publicly accessible dataset consists of resting neutral data as well as EEG recordings from people who were exposed to stimuli evoking happy, neutral, and negative emotions. For the best feature extraction, we pre-process the EEG data using artifact removal, bandpass filters, and normalization methods. With 100% accuracy on the validation set, our model produced outstanding results by utilizing the GRU's capacity to capture temporal dependencies. When compared to other machine learning techniques, our GRU model's Extreme Gradient Boosting Classifier had the highest accuracy. Our investigation of the confusion matrix revealed insightful information about the performance of the model, enabling precise emotion classification. This study emphasizes the potential of deep learning models like GRUs for emotion recognition and advances in affective computing. Our findings open up new possibilities for interacting with computers and comprehending how emotions are expressed through brainwave activity.
- Asia > India > NCT > New Delhi (0.04)
- Asia > China > Beijing > Beijing (0.04)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- (7 more...)
Noise removal methods on ambulatory EEG: A Survey
Johari, Sarthak, Meedinti, Gowri Namratha, Delhibabu, Radhakrishnan, Joshi, Deepak
Over many decades, research is being attempted for the removal of noise in the ambulatory EEG. In this respect, an enormous number of research papers is published for identification of noise removal, It is difficult to present a detailed review of all these literature. Therefore, in this paper, an attempt has been made to review the detection and removal of an noise. More than 100 research papers have been discussed to discern the techniques for detecting and removal the ambulatory EEG. Further, the literature survey shows that the pattern recognition required to detect ambulatory method, eye open and close, varies with different conditions of EEG datasets. This is mainly due to the fact that EEG detected under different conditions has different characteristics. This is, in turn, necessitates the identification of pattern recognition technique to effectively distinguish EEG noise data from a various condition of EEG data.
- Research Report > New Finding (1.00)
- Overview (0.86)
Analyzing Populations of Neural Networks via Dynamical Model Embedding
Cotler, Jordan, Tai, Kai Sheng, Hernández, Felipe, Elias, Blake, Sussillo, David
A crucial feature of neural networks with a fixed network architecture is that they form a manifold by virtue of their continuously tunable weights, which underlies their ability to be trained by gradient descent. However, this conception of the space of neural networks is inadequate for understanding the computational processes the networks perform. For example, two neural networks trained to perform the same task may have vastly different weights, and yet implement the same high-level algorithms and computational processes (Maheswaranathan et al., 2019b). In this paper, we construct an algorithm which provides alternative parametrizations of the space of RNNs and CNNs with the goal of endowing a geometric structure that is more compatible with the high-level computational processes performed by neural networks. In particular, given a set of neural networks with the same or possibly different architectures (and possibly trained on different tasks), we find a parametrization of a low-dimensional submanifold of neural networks which approximately interpolates between these chosen "base models", as well as extrapolates beyond them. We can use such model embedding spaces to cluster neural networks and even compute model averages of neural networks. A key feature is that two points in model embedding space are nearby if they correspond to neural networks which implement similar high-level computational processes, in a manner to be described later. In this way, two neural networks may correspond to nearby points in model embedding space even if those neural networks have distinct weights or even architectures.