Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs

Nguyen, Duc Thien (Singapore Management University) | Yeoh, William (New Mexico State University) | Lau, Hoong Chuin (Singapore Management University) | Zilberstein, Shlomo (University of Massachusetts, Amherst) | Zhang, Chongjie (Massachusetts Institute of Technology)

Jul-14-2014–AAAI Conferences

Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multi-arm bandit DCOP algorithm on dynamic DCOPs.

algorithm, artificial intelligence, upstream oil & gas, (20 more...)

AAAI Conferences

Jul-14-2014

Conferences PDF

Add feedback

Country:
- North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Industry:
- Information Technology (0.46)
- Energy > Oil & Gas
  - Upstream (0.35)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Agents (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found