Action-modulated midbrain dopamine activity arises from distributed control policies
–Neural Information Processing Systems
Animal behavior is driven by multiple brain regions working in parallel with distinct control policies. We present a biologically plausible model of off-policy reinforcement learning in the basal ganglia, which enables learning in such an architecture. The model accounts for action-related modulation of dopamine activity that is not captured by previous models that implement on-policy algorithms. In particular, the model predicts that dopamine activity signals a combination of reward prediction error (as in classic models) and action surprise, a measure of how unexpected an action is relative to the basal ganglia's current policy. In the presence of the action surprise term, the model implements an approximate form of $Q$-learning.
Neural Information Processing Systems
Dec-23-2025, 22:11:29 GMT