Planning and Learning for Decentralized MDPs with Event Driven Rewards

Gupta, Tarun (International Institute of Information Technology, Hyderabad) | Kumar, Akshat (Singapore Management University) | Paruchuri, Praveen (International Institute of Information Technology, Hyderabad)

Apr-6-2018–AAAI Conferences

Decentralized (PO)MDPs provide a rigorous framework for sequential multiagent decision making under uncertainty. However, their high computational complexity limits the practical impact. To address scalability and real-world impact, we focus on settings where a large number of agents primarily interact through complex joint-rewards that depend on their entire histories of states and actions. Such history-based rewards encapsulate the notion of events or tasks such that the team reward is given only when the joint-task is completed. Algorithmically, we contribute — 1) A nonlinear programming (NLP) formulation for such event-based planning model; 2) A probabilistic inference based approach that scales much better than NLP solvers for a large number of agents; 3) A policy gradient based multiagent reinforcement learning approach that scales well even for exponential state- spaces.

decentralized mdp, event driven reward, planning and learning

AAAI Conferences

Apr-6-2018

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found