AITopics | multi-armed bandit task

Collaborating Authors

multi-armed bandit task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task

Dalin Guo, Angela J. Yu

Neural Information Processing SystemsFeb-15-2026, 07:07:36 GMT

Neural Information Processing Systems http://nips.cc/

dbm, probability, reward rate, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.47)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.65)

Add feedback

Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task

Neural Information Processing SystemsNov-20-2025, 23:16:00 GMT

How humans make repeated choices among options with imperfectly known reward outcomes is an important problem in psychology and neuroscience. This is often studied using multi-armed bandits, which is also frequently studied in machine learning. We present data from a human stationary bandit experiment, in which we vary the average abundance and variability of reward availability (mean and variance of reward rate distributions). Surprisingly, we find subjects significantly underestimate prior mean of reward rates -- based on their self-report, at the end of a game, on their reward expectation of non-chosen arms. Previously, human learning in the bandit task was found to be well captured by a Bayesian ideal learning model, the Dynamic Belief Model (DBM), albeit under an incorrect generative assumption of the temporal structure - humans assume reward rates can change over time even though they are actually fixed. We find that the pessimism bias in the bandit task is well captured by the prior mean of DBM when fitted to human choices; but it is poorly captured by the prior mean of the Fixed Belief Model (FBM), an alternative Bayesian model that (correctly) assumes reward rates to be constants. This pessimism bias is also incompletely captured by a simple reinforcement learning model (RL) commonly used in neuroscience and psychology, in terms of fitted initial Q-values. While it seems sub-optimal, and thus mysterious, that humans have an underestimated prior reward expectation, our simulations show that an underestimated prior mean helps to maximize long-term gain, if the observer assumes volatility when reward rates are stable and utilizes a softmax decision policy instead of the optimal one (obtainable by dynamic programming). This raises the intriguing possibility that the brain underestimates reward rates to compensate for the incorrect non-stationarity assumption in the generative model and a simplified decision policy.

bandit task, pessimism bias, reward rate, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task

Dalin Guo, Angela J. Yu

Neural Information Processing SystemsNov-20-2025, 21:17:11 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.47)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.65)

Add feedback

Reviews: Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task

Neural Information Processing SystemsOct-8-2024, 10:06:44 GMT

This paper presents an intriguing computational dissection of a particular form of reward rate underestimation in a bandit task (what the authors call as "pessimism bias"). Modeling suggests that this bias can be accounted for by a Bayesian model which assumes (erroneously) that reward rates are dynamic. The paper is well-written and the methods are sound. I think it could do a better job relating to previous literature, and there are some questions about the modeling and behavioral analysis which I detail below. Specific comments: I was surprised that there was no mention of Gershman & Niv (2015, Topics in Cognitive Science), which is one of the only papers I'm aware of that manipulates reward abundance.

multi-armed bandit task, pessimism bias, reward rate, (8 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.40)

Technology: