Dopamine Bonuses
–Neural Information Processing Systems
Substantial data support a temporal difference (TO) model of dopamine (OA) neuron activity in which the cells provide a global error signal for reinforcement learning. However, in certain circumstances, OAactivity seems anomalous under the TO model, responding to non-rewarding stimuli. We address these anomalies bysuggesting that OA cells multiplex information about reward bonuses,including Sutton's exploration bonuses and Ng et al's non-distorting shaping bonuses. We interpret this additional role for OA in terms of the unconditional attentional and psychomotor effectsof dopamine, having the computational role of guiding exploration. 1 Introduction Much evidence suggests that dopamine cells in the primate midbrain play an important rolein reward and action learning. Electrophysiological studies support a theory that OA cells signal a global prediction error for summed future reward in appetitive conditioning tasks (Montague et al, 1996; Schultz et al, 1997), in the form of a temporal difference prediction error term.
Neural Information Processing Systems
Dec-31-2001
- Country:
- Europe > United Kingdom
- England > Greater London > London (0.04)
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.05)
- Europe > United Kingdom
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.95)
- Technology: