Action-modulated midbrain dopamine activity arises from distributed control policies

Dec-23-2025, 22:11:29 GMT–Neural Information Processing Systems

Animal behavior is driven by multiple brain regions working in parallel with distinct control policies. We present a biologically plausible model of off-policy reinforcement learning in the basal ganglia, which enables learning in such an architecture. The model accounts for action-related modulation of dopamine activity that is not captured by previous models that implement on-policy algorithms. In particular, the model predicts that dopamine activity signals a combination of reward prediction error (as in classic models) and action surprise, a measure of how unexpected an action is relative to the basal ganglia's current policy. In the presence of the action surprise term, the model implements an approximate form of $Q$-learning.

action-modulated midbrain dopamine activity arise, control policy, name change, (6 more...)

Neural Information Processing Systems

Dec-23-2025, 22:11:29 GMT

Conferences Web Page

Add feedback

Industry:
- Health & Medicine (0.87)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.42)