Imitation Upper Confidence Bound for Bandits on a Graph

Lupu, Andrei (McGill University) | Precup, Doina (McGill University)

Feb-8-2018–AAAI Conferences

We consider a graph of interconnected agents implementing a common policy and each playing a bandit problem with identical reward distributions. We restrict the information propagated in the graph such that agents can uniquely observe each other's actions. We propose an extension of the Upper Confidence Bound (UCB) algorithm to this setting and empirically demonstrate that our solution improves the performance over UCB according to multiple metrics and within various graph configurations.

artificial intelligence, big data, data mining, (14 more...)

AAAI Conferences

Feb-8-2018

Conferences PDF

Add feedback

Country:
- North America > Canada > Quebec > Montreal (0.15)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.54)
  - Artificial Intelligence > Representation & Reasoning
    - Agents (0.72)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found