Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding
Nan Rosemary Ke, Anirudh Goyal ALIAS PARTH GOYAL, Olexa Bilaniuk, Jonathan Binas, Michael C. Mozer, Chris Pal, Yoshua Bengio
–Neural Information Processing Systems
The T = 100, itisclearthatT grows.SABstill tocompleteT = 5000, whereasT = 2000bothv self-attention 1/8 = 12.5%).
Neural Information Processing Systems
Feb-14-2026, 23:59:56 GMT
- Country:
- North America
- Canada > Quebec
- Montreal (0.05)
- United States > Colorado
- Boulder County > Boulder (0.04)
- Canada > Quebec
- North America
- Technology: