AITopics | delta rnn

Collaborating Authors

delta rnn

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Details and Ablation Studies for Language Modelling

Neural Information Processing SystemsApr-25-2026, 14:23:28 GMT

A.1 Experimental Settings All language models in Table 1 have the same Transformer configuration: a 16-layer model with a hidden size of 128 with 8 heads, and a feed-forward dimension of 2048. We use a dropout [75, 76, 77] rate of 0.1. The batch size is 96 and we train for about 120 epochs with Adam optimiser [78] with an initial learning rate of 0.00025 and 2000 learning rate warm-up steps. All models are trained with a back-propagation span of 256 tokens. During training, these segments are treated independently, except for the + full context cases in Table 1 where the states (both recurrent states and fast weight states) from a segment are used as initialisation for the subsequent segment. The models in + full context cases are also evaluated in the same way by carrying over the context throughout the evaluation text with a batch size of one. For all other cases, the evaluation is done by going through the text with a sliding window of size 256 with a batch size of one. Transformer states are computed for all positions in each window, but only the last position is used to compute perplexity (except in the first segment where all positions are used for evaluation) [2].

artificial intelligence, delta rnn, machine learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.42)

Add feedback

3f9e3767ef3b10a0de4c256d7ef9805d-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 08:44:56 GMT

delta net, delta rnn, lstm, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training

Chen, Xi, Gao, Chang, Wang, Zuowen, Cheng, Longbiao, Zhou, Sheng, Liu, Shih-Chii, Delbruck, Tobi

arXiv.org Artificial IntelligenceDec-14-2023

Recurrent Neural Networks (RNNs) are useful in temporal sequence tasks. However, training RNNs involves dense matrix multiplications which require hardware that can support a large number of arithmetic operations and memory accesses. Implementing online training of RNNs on the edge calls for optimized algorithms for an efficient deployment on hardware. Inspired by the spiking neuron model, the Delta RNN exploits temporal sparsity during inference by skipping over the update of hidden states from those inactivated neurons whose change of activation across two timesteps is below a defined threshold. This work describes a training algorithm for Delta RNNs that exploits temporal sparsity in the backward propagation phase to reduce computational requirements for training on the edge. Due to the symmetric computation graphs of forward and backward propagation during training, the gradient computation of inactivated neurons can be skipped. Results show a reduction of $\sim$80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss. Logic simulations of a hardware accelerator designed for the training algorithm show 2-10X speedup in matrix computations for an activation sparsity range of 50%-90%. Additionally, we show that the proposed Delta RNN training will be useful for online incremental learning on edge devices with limited computing resources.

gradient, propagation, sparsity, (14 more...)

arXiv.org Artificial Intelligence

2312.09391

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Education > Educational Setting > Online (0.54)
Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback