Environmental statistics and the trade-off between model-based and TD learning in humans
Simon, Dylan A., Daw, Nathaniel D.
–Neural Information Processing Systems
There is much evidence that humans and other animals utilize a combination of model-based and model-free RL methods. Although it has been proposed that these systems may dominate according to their relative statistical efficiency in different circumstances, there is little specific evidence -- especially in humans -- as to the details of this trade-off. Accordingly, we examine the relative performance of different RL approaches under situations in which the statistics of reward are differentially noisy and volatile. Using theory and simulation, we show that model-free TD learning is relatively most disadvantaged in cases of high volatility and low noise. We present data from a decision-making experiment manipulating these parameters, showing that humans shift learning strategies in accord with these predictions. The statistical circumstances favoring model-based RL are also those that promote a high learning rate, which helps explain why, in psychology, the distinction between these strategies is traditionally conceived in terms of rule-based vs. incremental learning.
Neural Information Processing Systems
Dec-31-2011
- Country:
- North America > United States > New York (0.14)
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.68)
- Technology: