Goto

Collaborating Authors

 Reinforcement Learning


EDGE: Explaining Deep Reinforcement Learning Policies S1 Additional Technical Details

Neural Information Processing Systems

Note that these games are two-player games, we select the runner in You-Shall-Not-Pass and kicker in Kick-And-Defend as our target agent. Section 4 mentioned that we download a well-trained policy for each game.


EDGE: ExplainingDeepReinforcementLearning Policies

Neural Information Processing Systems

Deep reinforcement learning has shown great success in automatic policy learning for various sequential decision-making problems, such as training AI agents to defeat professional players in sophisticated games [74, 65, 24, 37] and controlling robots to accomplish complicated tasks [33, 38].


FACMAC: FactoredMulti-AgentCentralised PolicyGradients

Neural Information Processing Systems

However, FACMAClearnsacentralised butfactored critic,which combines per-agent utilities into the joint action-value function via a non-linear monotonic function, as inQMIX, apopular multi-agentQ-learning algorithm. However,unlikeQMIX, there are no inherent constraints on factoring the critic. We thus also employ a nonmonotonic factorisation and empirically demonstrate that its increased representational capacity allows it to solve some tasks that cannot be solved with monolithic, ormonotonically factored critics.







EmergentGraphicalConventionsin aVisualCommunicationGame

Neural Information Processing Systems

Due to itsiconic nature (i.e., perceptual resemblance to or natural association with the referent), drawings serve as a powerful tool to communicate concepts transcending language barriers (Fay et al., 2014). In fact, we humans started to use drawings to convey messages dating back to 40,000-60,000 years ago (Hoffmann et al., 2018; Hawkins et al., 2019).