Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task

Feb-14-2020, 16:26:00 GMT–Neural Information Processing Systems

How humans make repeated choices among options with imperfectly known reward outcomes is an important problem in psychology and neuroscience. This is often studied using multi-armed bandits, which is also frequently studied in machine learning. We present data from a human stationary bandit experiment, in which we vary the average abundance and variability of reward availability (mean and variance of reward rate distributions). Surprisingly, we find subjects significantly underestimate prior mean of reward rates -- based on their self-report, at the end of a game, on their reward expectation of non-chosen arms. Previously, human learning in the bandit task was found to be well captured by a Bayesian ideal learning model, the Dynamic Belief Model (DBM), albeit under an incorrect generative assumption of the temporal structure - humans assume reward rates can change over time even though they are actually fixed.

bandit task, pessimism bias, reward rate, (7 more...)

Neural Information Processing Systems

Feb-14-2020, 16:26:00 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.63)