AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Sekitoshi Kanai, Yasuhiro Fujiwara, Sotetsu Iwamura

Preventing Gradient Explosions in Gated Recurrent Units

Neural Information Processing SystemsNov-21-2025, 13:47:49 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, natural language, (18 more...)

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)
(2 more...)

Industry:

Media > Music (0.47)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Neural Information Processing SystemsMay-27-2025, 07:17:48 GMT

Universal In-Context Approximation By Prompting Fully Recurrent Models

architecture, recurrent model, universal in-context approximation, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-8-2024, 12:50:23 GMT

Reviews: Preventing Gradient Explosions in Gated Recurrent Units

Summary The authors propose a method for optimizing GRU networks which aims to prevent exploding gradients. They motivate the method by showing that a constraint on the spectral norm of the state-to-state matrix keeps the dynamics of the network stable near the fixed point 0. The method is evaluated on language modelling and a music prediction task and leads to stable training in comparison to weight clipping. Technical quality The motivation of the method is well developed and it is nice that the method is evaluated on two different real-world datasets. However, one important issue I have with the evaluation is that the learning rate is not controlled for in the experiments. Unfortunately, this makes it hard to draw strong conclusions from the results.

gated recurrent unit, grus, preventing gradient explosion, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.65)

Sekitoshi Kanai, Yasuhiro Fujiwara, Sotetsu Iwamura

Preventing Gradient Explosions in Gated Recurrent Units

Neural Information Processing SystemsOct-4-2024, 10:17:58 GMT

A gated recurrent unit (GRU) is a successful recurrent neural network architecture for time-series data. The GRU is typically trained using a gradient-based method, which is subject to the exploding gradient problem in which the gradient increases significantly. This problem is caused by an abrupt change in the dynamics of the GRU due to a small variation in the parameters. In this paper, we find a condition under which the dynamics of the GRU changes drastically and propose a learning method to address the exploding gradient problem. Our method constrains the dynamics of the GRU so that it does not drastically change. We evaluated our method in experiments on language modeling and polyphonic music modeling. Our experiments showed that our method can prevent the exploding gradient problem and improve modeling accuracy.

gradient, gru, recurrent neural network, (14 more...)

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)
(2 more...)

Genre: Research Report (0.34)

Industry:

Media > Music (0.87)
Leisure & Entertainment (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Csordás, Róbert, Potts, Christopher, Manning, Christopher D., Geiger, Atticus

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

arXiv.org Artificial IntelligenceAug-20-2024

The Linear Representation Hypothesis (LRH) states that neural networks learn to encode concepts as directions in activation space, and a strong version of the LRH states that models learn only such encodings. In this paper, we present a counterexample to this strong LRH: when trained to repeat an input token sequence, gated recurrent neural networks (RNNs) learn to represent the token at each position with a particular order of magnitude, rather than a direction. These representations have layered features that are impossible to locate in distinct linear subspaces. To show this, we train interventions to predict and manipulate tokens by learning the scaling factor corresponding to each sequence position. These interventions indicate that the smallest RNNs find only this magnitude-based solution, while larger RNNs have linear representations. These findings strongly indicate that interpretability research should not be confined by the LRH.

intervention, representation, sequence, (15 more...)

2408.1092

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > France (0.04)
(7 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Johari, Sarthak, Meedinti, Gowri Namratha, Delhibabu, Radhakrishnan, Joshi, Deepak

Unveiling Emotions from EEG: A GRU-Based Approach

arXiv.org Artificial IntelligenceJul-20-2023

One of the most important study areas in affective computing is emotion identification using EEG data. In this study, the Gated Recurrent Unit (GRU) algorithm, which is a type of Recurrent Neural Networks (RNNs), is tested to see if it can use EEG signals to predict emotional states. Our publicly accessible dataset consists of resting neutral data as well as EEG recordings from people who were exposed to stimuli evoking happy, neutral, and negative emotions. For the best feature extraction, we pre-process the EEG data using artifact removal, bandpass filters, and normalization methods. With 100% accuracy on the validation set, our model produced outstanding results by utilizing the GRU's capacity to capture temporal dependencies. When compared to other machine learning techniques, our GRU model's Extreme Gradient Boosting Classifier had the highest accuracy. Our investigation of the confusion matrix revealed insightful information about the performance of the model, enabling precise emotion classification. This study emphasizes the potential of deep learning models like GRUs for emotion recognition and advances in affective computing. Our findings open up new possibilities for interacting with computers and comprehending how emotions are expressed through brainwave activity.

eeg data, emotion recognition, recognition, (12 more...)

2308.02778

Country:

Asia > India > NCT > New Delhi (0.04)
Asia > China > Beijing > Beijing (0.04)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Johari, Sarthak, Meedinti, Gowri Namratha, Delhibabu, Radhakrishnan, Joshi, Deepak

Noise removal methods on ambulatory EEG: A Survey

arXiv.org Artificial IntelligenceJul-16-2023

Over many decades, research is being attempted for the removal of noise in the ambulatory EEG. In this respect, an enormous number of research papers is published for identification of noise removal, It is difficult to present a detailed review of all these literature. Therefore, in this paper, an attempt has been made to review the detection and removal of an noise. More than 100 research papers have been discussed to discern the techniques for detecting and removal the ambulatory EEG. Further, the literature survey shows that the pattern recognition required to detect ambulatory method, eye open and close, varies with different conditions of EEG datasets. This is mainly due to the fact that EEG detected under different conditions has different characteristics. This is, in turn, necessitates the identification of pattern recognition technique to effectively distinguish EEG noise data from a various condition of EEG data.

artificial intelligence, deep learning, machine learning, (18 more...)

2308.02437

Country:

Asia > India > NCT > New Delhi (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre:

Research Report > New Finding (1.00)
Overview (0.86)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Cotler, Jordan, Tai, Kai Sheng, Hernández, Felipe, Elias, Blake, Sussillo, David

Analyzing Populations of Neural Networks via Dynamical Model Embedding

arXiv.org Artificial IntelligenceFeb-27-2023

A crucial feature of neural networks with a fixed network architecture is that they form a manifold by virtue of their continuously tunable weights, which underlies their ability to be trained by gradient descent. However, this conception of the space of neural networks is inadequate for understanding the computational processes the networks perform. For example, two neural networks trained to perform the same task may have vastly different weights, and yet implement the same high-level algorithms and computational processes (Maheswaranathan et al., 2019b). In this paper, we construct an algorithm which provides alternative parametrizations of the space of RNNs and CNNs with the goal of endowing a geometric structure that is more compatible with the high-level computational processes performed by neural networks. In particular, given a set of neural networks with the same or possibly different architectures (and possibly trained on different tasks), we find a parametrization of a low-dimensional submanifold of neural networks which approximately interpolates between these chosen "base models", as well as extrapolates beyond them. We can use such model embedding spaces to cluster neural networks and even compute model averages of neural networks. A key feature is that two points in model embedding space are nearby if they correspond to neural networks which implement similar high-level computational processes, in a manner to be described later. In this way, two neural networks may correspond to nearby points in model embedding space even if those neural networks have distinct weights or even architectures.

artificial intelligence, base model, machine learning, (18 more...)

2302.14078

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

El-Naggar, Nadine, Madhyastha, Pranava, Weyde, Tillman

Exploring the Long-Term Generalization of Counting Behavior in RNNs

arXiv.org Artificial IntelligenceNov-29-2022

In this study, we investigate the generalization of LSTM, ReLU and GRU models on counting tasks over long sequences. Previous theoretical work has established that RNNs with ReLU activation and LSTMs have the capacity for counting with suitable configuration, while GRUs have limitations that prevent correct counting over longer sequences. Despite this and some positive empirical results for LSTMs on Dyck-1 languages, our experimental results show that LSTMs fail to learn correct counting behavior for sequences that are significantly longer than in the training data. ReLUs show much larger variance in behavior and in most cases worse generalization. The long sequence generalization is empirically related to validation loss, but reliable long sequence generalization seems not practically achievable through backpropagation with current techniques. We demonstrate different failure modes for LSTMs, GRUs and ReLUs. In particular, we observe that the saturation of activation functions in LSTMs and the correct weight setting for ReLUs to generalize counting behavior are not achieved in standard training regimens. In summary, learning generalizable counting behavior is still an open problem and we discuss potential approaches for further research.

artificial intelligence, machine learning, sequence, (17 more...)

2211.16429

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)