A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents

Neural Information Processing Systems 

In multiagent domains, coping with non-stationary agents that change behaviors from time to time is a challenging problem, where an agent is usually required to be able to quickly detect the other agent's policy during online interaction, and then adapt its own policy accordingly.