AITopics | local temporal difference code

A Local Temporal Difference Code for Distributional Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 09:02:18 GMT

Recent theoretical and experimental results suggest that the dopamine system implements distributional temporal difference backups, allowing learning of the entire distributions of the long-run values of states rather than just their expected values. However, the distributional codes explored so far rely on a complex imputation step which crucially relies on spatial non-locality: in order to compute reward prediction errors, units must know not only their own state but also the states of the other units. It is far from clear how these steps could be implemented in realistic neural circuits. Here, we introduce the Laplace code: a local temporal difference code for distributional reinforcement learning that is representationally powerful and computationally straightforward. The code decomposes value distributions and prediction errors across three separated dimensions: reward magnitude (related to distributional quantiles), temporal discounting (related to the Laplace transform of future rewards) and time horizon (related to eligibility traces). Besides lending itself to a local learning rule, the decomposition recovers the temporal evolution of the immediate reward distribution, indicating all possible rewards at all future times. This increases representational capacity and allows for temporally-flexible computations that immediately adjust to changing horizons or discount factors.

distributional reinforcement learning, local temporal difference code, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

Review for NeurIPS paper: A Local Temporal Difference Code for Distributional Reinforcement Learning

Neural Information Processing SystemsJan-27-2025, 01:18:21 GMT

Clarity: This is my biggest issue with this paper: it is _very_ difficult to follow and most of the figures are difficult to interpret. In more detail: - Overall, there are too many references to the supplemental material (e.g. "see SM-C") for things that are necessary for understanding the main paper. What do the bar plots on top of the grid represent? What are the dark and grey lines on the right plot meant to represent?

caption, distributional reinforcement learning, equation, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Review for NeurIPS paper: A Local Temporal Difference Code for Distributional Reinforcement Learning

Neural Information Processing SystemsJan-27-2025, 01:18:14 GMT

The reviewers appreciated the interesting and novel contribution made here. However, Reviewers 2 and 4 expressed some serious concerns about the legibility of the paper. To quote the discussion, "Someone not familiar with either distributional RL or neuroscience will be lost when reading this paper." The question is therefore whether the issue can be resolved during this conference cycle. I believe it can but that it will require significant editing; I also think it is critical to support interdisciplinary work.

distributional reinforcement learning, local temporal difference code, neurips paper, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

A Local Temporal Difference Code for Distributional Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 22:25:53 GMT

Recent theoretical and experimental results suggest that the dopamine system implements distributional temporal difference backups, allowing learning of the entire distributions of the long-run values of states rather than just their expected values. However, the distributional codes explored so far rely on a complex imputation step which crucially relies on spatial non-locality: in order to compute reward prediction errors, units must know not only their own state but also the states of the other units. It is far from clear how these steps could be implemented in realistic neural circuits. Here, we introduce the Laplace code: a local temporal difference code for distributional reinforcement learning that is representationally powerful and computationally straightforward. The code decomposes value distributions and prediction errors across three separated dimensions: reward magnitude (related to distributional quantiles), temporal discounting (related to the Laplace transform of future rewards) and time horizon (related to eligibility traces). Besides lending itself to a local learning rule, the decomposition recovers the temporal evolution of the immediate reward distribution, indicating all possible rewards at all future times.

distributional reinforcement learning, local temporal difference code, prediction error

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

local temporal difference code

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

A Local Temporal Difference Code for Distributional Reinforcement Learning

Review for NeurIPS paper: A Local Temporal Difference Code for Distributional Reinforcement Learning

Review for NeurIPS paper: A Local Temporal Difference Code for Distributional Reinforcement Learning

A Local Temporal Difference Code for Distributional Reinforcement Learning