Goto

Collaborating Authors

 Reinforcement Learning


SHAQ: IncorporatingShapleyValueTheoryinto Multi-AgentQ-Learning

Neural Information Processing Systems

Value factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however, its underlying mechanism is not yet fully understood.



ActiveExplorationfor InverseReinforcementLearning

Neural Information Processing Systems

Instead of using an explicit reward function, Inverse Reinforcement Learning (IRL; Ng et al., 2000) seeks to recover the reward by observing anexpert,e.g.,anhuman whoalready knowshowtoperform atask. However,most existing IRL algorithms assume that the transition model, and in some cases, the expert's policy, areknown.






ImprovingSampleComplexityBoundsfor(Natural) Actor-CriticAlgorithms

Neural Information Processing Systems

The goal of reinforcement learning (RL) [39] is to maximize the expected total reward by taking actions according toapolicyinastochastic environment, whichismodelled asaMarkovdecision process (MDP) [4]. To obtain an optimal policy, one popular method is the direct maximization of the expected total reward via gradient ascent, which is referred to as the policy gradient (PG) method [40,47].