Agents
FACMAC: FactoredMulti-AgentCentralised PolicyGradients
However, FACMAClearnsacentralised butfactored critic,which combines per-agent utilities into the joint action-value function via a non-linear monotonic function, as inQMIX, apopular multi-agentQ-learning algorithm. However,unlikeQMIX, there are no inherent constraints on factoring the critic. We thus also employ a nonmonotonic factorisation and empirically demonstrate that its increased representational capacity allows it to solve some tasks that cannot be solved with monolithic, ormonotonically factored critics.
DelayedPropagationTransformer: AUniversalComputationEnginetowardsPractical ControlinCyber-PhysicalSystems
DePT induces a cone-shaped spatial-temporal attention prior,which injects theinformation propagation and aggregation principles and enables a global view. With physical constraint inductive bias baked into its design, our DePT is ready to plug and play for a broad class of multi-agent systems. The experimental results on one of the most challenging CPS - network-scale traffic signal control system in the open world - show that our model outperformed the state-of-the-art expert methods on synthetic and real-world datasets.