ExplainMySurprise: LearningEfficientLong-Term MemorybyPredictingUncertainOutcomes
–Neural Information Processing Systems
In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application ofgradient based training requires intermediate computations tobestored for every element of a sequence. This requires to store prohibitively large intermediate data ifasequence consists ofthousands oreven millions elements, and asaresult, makeslearning ofverylong-term dependencies infeasible.
Neural Information Processing Systems
Feb-12-2026, 18:17:56 GMT
- Country:
- Africa > Ethiopia (0.04)
- Asia > Russia (0.05)
- Europe
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.05)
- Sweden > Stockholm
- Stockholm (0.04)
- Russia > Central Federal District
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.14)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- United States > California
- Los Angeles County > Long Beach (0.04)
- San Francisco County > San Francisco (0.14)
- Canada
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Technology: