Bandit Social Learning: Exploration under Myopic Behavior

Banihashem, Kiarash, Hajiaghayi, MohammadTaghi, Shin, Suho, Slivkins, Aleksandrs

arXiv.org Artificial Intelligence 

Reviews and ratings are pervasive in many online platforms. A customer consults reviews/ratings, then chooses a product and then (often) leaves feedback, which is aggregated by the platform and served to future customers. Collectively, customers face a tradeoff between exploration and exploitation, i.e., between acquiring new information while making potentially suboptimal decisions and making optimal decisions using available information. However, individual customers tend to act myopically and favor exploitation, without regards to exploration for the sake of the others. On a high level, we ask whether/how the myopic behavior interferes with efficient exploration. We are particularly interested in learning failures when only a few agents choose an optimal action.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found