Policy Gradient With Value Function Approximation For Collective Multiagent Planning

Nguyen, Duc Thien, Kumar, Akshat, Lau, Hoong Chuin

Feb-14-2020, 15:12:02 GMT–Neural Information Processing Systems

Decentralized (PO)MDPs provide an expressive framework for sequential decision making in a multiagent system. Given their computational complexity, recent research has focused on tractable yet practical subclasses of Dec-POMDPs. We address such a subclass called CDec-POMDP where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our main contribution is an actor-critic (AC) reinforcement learning method for optimizing CDec-POMDP policies. Vanilla AC has slow convergence for larger problems.

collective multiagent planning, policy gradient, value function approximation, (1 more...)

Neural Information Processing Systems

Feb-14-2020, 15:12:02 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning
    - Agents (1.00)
    - Uncertainty > Fuzzy Logic (0.40)