Multi-Robot Deep Reinforcement Learning with Macro-Actions

Xiao, Yuchen, Hoffman, Joshua, Xia, Tian, Amato, Christopher

arXiv.org Artificial Intelligence 

A. MacDec-POMDPs Decentralized fully collaborative multi-agent decision-making under uncertainty can be modeled as a decentralized POMDP (Dec-POMDP) [14]. Due to the assumption of synchronous actions that require the same amount of time for each agent, Dec-POMDPs are not applicable to multi-robot planning and learning scenarios in real-world. MacDec-POMDPs, formalized by introducing macro-actions into Dec-POMDPs, inherently allow asynchronous execution among robots with temporally extended macro-actions that can begin and end at different times for each agent. Formally, a MacDec-POMDP is defined as a tuple nullI,S,A, Ω,M,ζ,O,T,Z,R null, where I is a finite set of agents; S is a finite set of environment states; A iA i and Ω iΩ i are the spaces of joint-primitive-action and joint-primitive-observation respectively; M iM i is the joint set of each agent's finite macro-action space M i; ζ iζ i is the set of joint macro-observations over agents' finite macro-observation space ζ i. Given a macro-action- based policy, each agent i is allowed to asynchronously choose a macro-action m i nullβ m,I m,π m null i that depends on individual macro-action-observation histories, where β m: H A i [0, 1] is the stochastic termination condition and I m H M i is the initiation set of the corresponding macro-action m i, respectively depending on the primitive-action- observation history space H A i and macro-action-observation history space H M i of agent i; π m: H A i A i denotes the low-level policy to achieve the macro-action m, and during the execution, each agent's primitive-observation o i Ω i is generated according to probability observation function O i(o i,a i,s) Pr( o i a i,s), and a shared immediate reward r ( s,null a), where null a A iA i, is issued according to the reward function R: S A R .

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found