AITopics | competitive reinforcement learning

Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 04:22:06 GMT

This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on the Honor of Kings, one of the world's most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multi-agent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges. All of the software, including the environment-class, are publicly available.

competitive reinforcement learning, honor, king arena, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)

Add feedback

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing SystemsDec-23-2025, 23:13:45 GMT

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent learning in competitive RL, as prior work has largely focused on centralized/coordinated procedures for equilibrium computation.

competitive reinforcement learning, independent policy gradient method, name change, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing SystemsMay-26-2025, 20:57:29 GMT

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent learning in competitive RL, as prior work has largely focused on centralized/coordinated procedures for equilibrium computation.

competitive reinforcement learning, independent policy gradient method, machine learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Review for NeurIPS paper: Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 14:36:19 GMT

Weaknesses: I am not convinced by the main motivation of this paper for decoupled or independent learning. Specifically, from the communication perspective, once agents can also communicate the actions each other took per round, then each agent can also simulate any coupled algorithm locally (or only coupled online algorithm if has storage limitation). Since agents have to communicate with the oracle or environment in each round anyway, I don't see in practice why communicate the actions in the learning process is that problematic. Second, this paper says that the independent learning is important because it allows the algorithm "being versatile, being applicable even in uncertain environments where the type of interaction and number of other agents are not known to the agent. " I feel this description does not fit the algorithm studied in this paper, thus a bit misleading.

agent, competitive reinforcement learning, independent policy gradient method, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Review for NeurIPS paper: Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 14:36:11 GMT

The reviewers agreed that this is a solid work, on an important problem for which existing results are scarce. However, there were several concerns: - The authors create some confusion in describing their method as "independent" - the agents have to coordinate the learning rates ahead of time. I believe that these concerns actually open the door for interesting followup work, and therefore recommend acceptance. I ask the authors to tone down the independence claims in the final version, given the concern above.

competitive reinforcement learning, independent policy gradient method, neurips paper

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning

Neural Information Processing SystemsOct-11-2024, 00:05:53 GMT

This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on the Honor of Kings, one of the world's most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multi-agent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges.

competitive reinforcement learning, generalization, king arena, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 00:21:55 GMT

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent learning in competitive RL, as prior work has largely focused on centralized/coordinated procedures for equilibrium computation.

competitive reinforcement learning, independent policy gradient method

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning

Wei, Hua, Chen, Jingxiao, Ji, Xiyang, Qin, Hongyang, Deng, Minwen, Li, Siqin, Wang, Liang, Zhang, Weinan, Yu, Yong, Liu, Lin, Huang, Lanxiao, Ye, Deheng, Fu, Qiang, Yang, Wei

arXiv.org Artificial IntelligenceOct-18-2022

This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on Honor of Kings, one of the world's most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multi-agent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges. All of the software, including the environment-class, are publicly available at https://github.com/tencent-ailab/hok_env . The documentation is available at https://aiarena.tencent.com/hok/doc/ .

artificial intelligence, competitive reinforcement learning, machine learning, (2 more...)

arXiv.org Artificial Intelligence

2209.08483

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)

Add feedback

Filters

Collaborating Authors

competitive reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Review for NeurIPS paper: Independent Policy Gradient Methods for Competitive Reinforcement Learning

Review for NeurIPS paper: Independent Policy Gradient Methods for Competitive Reinforcement Learning

Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning

Independent Policy Gradient Methods for Competitive Reinforcement Learning

Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning