competitive reinforcement learning
Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning
This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on the Honor of Kings, one of the world's most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multi-agent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges. All of the software, including the environment-class, are publicly available.
Independent Policy Gradient Methods for Competitive Reinforcement Learning
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent learning in competitive RL, as prior work has largely focused on centralized/coordinated procedures for equilibrium computation.
Independent Policy Gradient Methods for Competitive Reinforcement Learning
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent learning in competitive RL, as prior work has largely focused on centralized/coordinated procedures for equilibrium computation.
Review for NeurIPS paper: Independent Policy Gradient Methods for Competitive Reinforcement Learning
Weaknesses: I am not convinced by the main motivation of this paper for decoupled or independent learning. Specifically, from the communication perspective, once agents can also communicate the actions each other took per round, then each agent can also simulate any coupled algorithm locally (or only coupled online algorithm if has storage limitation). Since agents have to communicate with the oracle or environment in each round anyway, I don't see in practice why communicate the actions in the learning process is that problematic. Second, this paper says that the independent learning is important because it allows the algorithm "being versatile, being applicable even in uncertain environments where the type of interaction and number of other agents are not known to the agent. " I feel this description does not fit the algorithm studied in this paper, thus a bit misleading.
Review for NeurIPS paper: Independent Policy Gradient Methods for Competitive Reinforcement Learning
The reviewers agreed that this is a solid work, on an important problem for which existing results are scarce. However, there were several concerns: - The authors create some confusion in describing their method as "independent" - the agents have to coordinate the learning rates ahead of time. I believe that these concerns actually open the door for interesting followup work, and therefore recommend acceptance. I ask the authors to tone down the independence claims in the final version, given the concern above.
Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning
This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on the Honor of Kings, one of the world's most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multi-agent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges.
Independent Policy Gradient Methods for Competitive Reinforcement Learning
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent learning in competitive RL, as prior work has largely focused on centralized/coordinated procedures for equilibrium computation.
Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning
Wei, Hua, Chen, Jingxiao, Ji, Xiyang, Qin, Hongyang, Deng, Minwen, Li, Siqin, Wang, Liang, Zhang, Weinan, Yu, Yong, Liu, Lin, Huang, Lanxiao, Ye, Deheng, Fu, Qiang, Yang, Wei
This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on Honor of Kings, one of the world's most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multi-agent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges. All of the software, including the environment-class, are publicly available at https://github.com/tencent-ailab/hok_env . The documentation is available at https://aiarena.tencent.com/hok/doc/ .