AITopics | sparse attentive backtracking

The T = 100, itisclearthatT grows.SABstill tocompleteT = 5000, whereasT = 2000bothv self-attention 1/8 = 12.5%).

bptt, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.05)
North America > United States > Colorado > Boulder County > Boulder (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.39)

Add feedback

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Neural Information Processing SystemsNov-20-2025, 23:08:08 GMT

Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state. We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state. Based on this principle, we study a novel algorithm which only back-propagates through a few of these temporal skip connections, realized by a learned attention mechanism that associates current states with relevant past states. We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly long-term dependencies, but without requiring the biologically implausible backward replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences significantly better than LSTMs trained with BPTT and LSTMs trained with full self-attention.

name change, sparse attentive backtracking, temporal credit assignment, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Nan Rosemary Ke, Anirudh Goyal ALIAS PARTH GOYAL, Olexa Bilaniuk, Jonathan Binas, Michael C. Mozer, Chris Pal, Yoshua Bengio

Neural Information Processing SystemsNov-20-2025, 20:47:07 GMT

However, humans are often reminded of past memories or mental states which are associated with the current mental state.

credit assignment, mechanism, sequence, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.82)

Add feedback

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Neural Information Processing SystemsOct-8-2024, 19:48:33 GMT

Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state. We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state.

long sequence, sparse attentive backtracking, temporal credit assignment, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Reviews: Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Neural Information Processing SystemsOct-8-2024, 07:35:42 GMT

The authors augment an RNN with skip connections in time, that are sparsely gated by learnable attention. This allows to reap the benefits of full BPTT while effectively using only truncated BPTT. While other forms of attention-gated skip connections in time have been suggested before, to which the authors compare, here the authors looked at sparse (still differentiable) retrieval where only a few top memory entries are selected, enabling the benefits of backpropagating over only a few selected earlier states. Overall, I think this work is very significant, both for enabling faster implementations of BPTT when considering long time horizons, but also for suggesting future directions for how the brain might perform credit assignment and for pointing out further brain strategies / biases to employ in machine learning. With some clarifications / changes as below, I recommend the acceptance of this article for NIPS. 1. In lines 58-60, the authors say that BPTT would require "playing back these events".

bptt, sparse attentive backtracking, temporal credit assignment, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Ke, Nan Rosemary, GOYAL, Anirudh Goyal ALIAS PARTH, Bilaniuk, Olexa, Binas, Jonathan, Mozer, Michael C., Pal, Chris, Bengio, Yoshua

Neural Information Processing SystemsFeb-14-2020, 19:56:34 GMT

Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state.

long sequence, sparse attentive backtracking, temporal credit assignment, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Ke, Nan Rosemary, GOYAL, Anirudh Goyal ALIAS PARTH, Bilaniuk, Olexa, Binas, Jonathan, Mozer, Michael C., Pal, Chris, Bengio, Yoshua

Neural Information Processing SystemsDec-31-2018

Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state. We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state. Based on this principle, we study a novel algorithm which only back-propagates through a few of these temporal skip connections, realized by a learned attention mechanism that associates current states with relevant past states. We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly long-term dependencies, but without requiring the biologically implausible backward replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences significantly better than LSTMs trained with BPTT and LSTMs trained with full self-attention.

artificial intelligence, machine learning, sequence, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Ke, Nan Rosemary, GOYAL, Anirudh Goyal ALIAS PARTH, Bilaniuk, Olexa, Binas, Jonathan, Mozer, Michael C., Pal, Chris, Bengio, Yoshua

Neural Information Processing SystemsDec-31-2018

Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state. We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state. Based on this principle, we study a novel algorithm which only back-propagates through a few of these temporal skip connections, realized by a learned attention mechanism that associates current states with relevant past states. We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly long-term dependencies, but without requiring the biologically implausible backward replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences significantly better than LSTMs trained with BPTT and LSTMs trained with full self-attention.

artificial intelligence, machine learning, sequence, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

Ke, Nan Rosemary, Goyal, Anirudh, Bilaniuk, Olexa, Binas, Jonathan, Mozer, Michael C., Pal, Chris, Bengio, Yoshua

arXiv.org Machine LearningSep-11-2018

Learning long-term dependencies in extended temporal sequences requires credit assignment to events far back in the past. The most common method for training recurrent neural networks, back-propagation through time (BPTT), requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months, or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state. We consider the hypothesis that such memory associations between past and present could be used for credit assignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state. Based on this principle, we study a novel algorithm which only back-propagates through a few of these temporal skip connections, realized by a learned attention mechanism that associates current states with relevant past states. We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly long-term dependencies, but without requiring the biologically implausible backward replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences significantly better than LSTMs trained with BPTT and LSTMs trained with full self-attention.

artificial intelligence, machine learning, sequence, (17 more...)

arXiv.org Machine Learning

1809.03702

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks

Ke, Nan Rosemary, Goyal, Anirudh, Bilaniuk, Olexa, Binas, Jonathan, Charlin, Laurent, Pal, Chris, Bengio, Yoshua

arXiv.org Machine LearningNov-7-2017

A major drawback of backpropagation through time (BPTT) is the difficulty of learning long-term dependencies, coming from having to propagate credit information backwards through every single step of the forward computation. This makes BPTT both computationally impractical and biologically implausible. For this reason, full backpropagation through time is rarely used on long sequences, and truncated backpropagation through time is used as a heuristic. However, this usually leads to biased estimates of the gradient in which longer term dependencies are ignored. Addressing this issue, we propose an alternative algorithm, Sparse Attentive Backtracking, which might also be related to principles used by brains to learn long-term dependencies. Sparse Attentive Backtracking learns an attention mechanism over the hidden states of the past and selectively backpropagates through paths with high attention weights. This allows the model to learn long term dependencies while only backtracking for a small number of time steps, not just from the recent past but also from attended relevant past states.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1711.02326

Country: North America > Canada (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

sparse attentive backtracking

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Reviews: Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding

Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding

Sparse Attentive Backtracking: Long-Range Credit Assignment in Recurrent Networks