Dopamine Bonuses

Kakade, Sham, Dayan, Peter

Neural Information Processing Systems 

Substantial data support a temporal difference (TO) model of dopamine (OA) neuron activity in which the cells provide a global error signal for reinforcement learning. However, in certain circumstances, OA activity seems anomalous under the TO model, responding to non-rewarding stimuli. We address these anomalies by suggesting that OA cells multiplex information about reward bonuses, including Sutton's exploration bonuses and Ng et al's non-distorting shaping bonuses. We interpret this additional role for OA in terms of the unconditional attentional and psychomotor effects of dopamine, having the computational role of guiding exploration. 1 Introduction Much evidence suggests that dopamine cells in the primate midbrain play an important role in reward and action learning. Electrophysiological studies support a theory that OA cells signal a global prediction error for summed future reward in appetitive conditioning tasks (Montague et al, 1996; Schultz et al, 1997), in the form of a temporal difference prediction error term.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found