AITopics | nfsp

Collaborating Authors

nfsp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

c2e06e9a80370952f6ec5463c77cbace-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 01:02:53 GMT

iteration, loss game, meta-ne solver, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Supplementary Materials For XDO: A Double Oracle Algorithm for Extensive-Form Games 1 Proofs Proposition 1. In XDO with an null

Neural Information Processing SystemsAug-17-2025, 05:50:00 GMT

's population policies chooses action In a given iteration, consider the restricted game for a single GMP game. If player 2 is not allowed an action unavailable to player 1, player 2's BR will be a new action In pk,m q-clone GMP with n classes, XDO adds at most 2 n actions for each player . In total, 2n actions may be added for each player.Proposition 6. Like in that work, we represent actions that are in the restricted game by bold arrows. Extensive-form pure strategies specify an action at every infostate.

artificial intelligence, iteration, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Unified Policy Optimization for Continuous-action Reinforcement Learning in Non-stationary Tasks and Games

Qin, Rong-Jun, Luo, Fan-Ming, Qian, Hong, Yu, Yang

arXiv.org Artificial IntelligenceAug-19-2022

This paper addresses policy learning in non-stationary environments and games with continuous actions. Rather than the classical reward maximization mechanism, inspired by the ideas of follow-the-regularized-leader (FTRL) and mirror descent (MD) update, we propose a no-regret style reinforcement learning algorithm PORL for continuous action tasks. We prove that PORL has a last-iterate convergence guarantee, which is important for adversarial and cooperative games. Empirical studies show that, in stationary environments such as MuJoCo locomotion controlling tasks, PORL performs equally well as, if not better than, the soft actor-critic (SAC) algorithm; in non-stationary environments including dynamical environments, adversarial training, and competitive games, PORL is superior to SAC in both a better final policy performance and a more stable training process.

algorithm, convergence, porl, (15 more...)

arXiv.org Artificial Intelligence

2208.09452

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(15 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (0.68)
Education (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Solving Large-Scale Extensive-Form Network Security Games via Neural Fictitious Self-Play

Xue, Wanqi, Zhang, Youzhi, Li, Shuxin, Wang, Xinrun, An, Bo, Yeo, Chai Kiat

arXiv.org Artificial IntelligenceJun-1-2021

Securing networked infrastructures is important in the real world. The problem of deploying security resources to protect against an attacker in networked domains can be modeled as Network Security Games (NSGs). Unfortunately, existing approaches, including the deep learning-based approaches, are inefficient to solve large-scale extensive-form NSGs. In this paper, we propose a novel learning paradigm, NSG-NFSP, to solve large-scale extensive-form NSGs based on Neural Fictitious Self-Play (NFSP). Our main contributions include: i) reforming the best response (BR) policy network in NFSP to be a mapping from action-state pair to action-value, to make the calculation of BR possible in NSGs; ii) converting the average policy network of an NFSP agent into a metric-based classifier, helping the agent to assign distributions only on legal actions rather than all actions; iii) enabling NFSP with high-level actions, which can benefit training efficiency and stability in NSGs; and iv) leveraging information contained in graphs of NSGs by learning efficient graph node embeddings. Our algorithm significantly outperforms state-of-the-art algorithms in both scalability and solution quality.

artificial intelligence, machine learning, policy network, (19 more...)

arXiv.org Artificial Intelligence

2106.00897

Country: Asia (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology (0.87)
Law (0.75)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Optimize Neural Fictitious Self-Play in Regret Minimization Thinking

Chen, Yuxuan, Zhang, Li, Li, Shijian, Pan, Gang

arXiv.org Artificial IntelligenceApr-21-2021

Optimization of deep learning algorithms to approach Nash Equilibrium remains a significant problem in imperfect information games, e.g. StarCraft and poker. Neural Fictitious Self-Play (NFSP) has provided an effective way to learn approximate Nash Equilibrium without prior domain knowledge in imperfect information games. However, optimality gap was left as an optimization problem of NFSP and by solving the problem, the performance of NFSP could be improved. In this study, focusing on the optimality gap of NFSP, we have proposed a new method replacing NFSP's best response computation with regret matching method. The new algorithm can make the optimality gap converge to zero as it iterates, thus converge faster than original NFSP. We have conduct experiments on three typical environments of perfect-information games and imperfect information games in OpenSpiel and all showed that our new algorithm performances better than original NFSP.

algorithm, best response, nfsp, (15 more...)

arXiv.org Artificial Intelligence

2104.10845

Country:

Europe > France > Hauts-de-France > Nord > Lille (0.04)
North America > United States > New York > Richmond County > New York City (0.04)
North America > United States > New York > Queens County > New York City (0.04)
(7 more...)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.66)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

XDO: A Double Oracle Algorithm for Extensive-Form Games

McAleer, Stephen, Lanier, John, Baldi, Pierre, Fox, Roy

arXiv.org Artificial IntelligenceMar-10-2021

Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm for two-player zero-sum games that has empirically found approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to a Nash equilibrium, it may take an exponential number of iterations as the number of infostates grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm that is guaranteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations 1-2 orders of magnitude smaller than PSRO. In experiments on a modified Leduc poker game, we show that tabular XDO achieves over 11x lower exploitability than CFR and over 82x lower exploitability than PSRO and XFP in the same amount of time. We also show that NXDO beats PSRO and is competitive with NFSP on a large no-limit poker game.

infostate, iteration, xdo, (15 more...)

arXiv.org Artificial Intelligence

2103.06426

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On Solving Cooperative MARL Problems with a Few Good Experiences

Kumar, Rajiv Ranjan, Varakantham, Pradeep

arXiv.org Artificial IntelligenceJan-22-2020

Cooperative Multi-agent Reinforcement Learning (MARL) is crucial for cooperative decentralized decision learning in many domains such as search and rescue, drone surveillance, package delivery and fire fighting problems. In these domains, a key challenge is learning with a few good experiences, i.e., positive reinforcements are obtained only in a few situations (e.g., on extinguishing a fire or tracking a crime or delivering a package) and in most other situations there is zero or negative reinforcement. Learning decisions with a few good experiences is extremely challenging in cooperative MARL problems due to three reasons. First, compared to the single agent case, exploration is harder as multiple agents have to be coordinated to receive a good experience. Second, environment is not stationary as all the agents are learning at the same time (and hence change policies). Third, scale of problem increases significantly with every additional agent. Relevant existing work is extensive and has focussed on dealing with a few good experiences in single-agent RL problems or on scalable approaches for handling non-stationarity in MARL problems. Unfortunately, neither of these approaches (or their extensions) are able to address the problem of sparse good experiences effectively. Therefore, we provide a novel fictitious self imitation approach that is able to simultaneously handle non-stationarity and sparse good experiences in a scalable manner. Finally, we provide a thorough comparison (experimental or descriptive) against relevant cooperative MARL algorithms to demonstrate the utility of our approach.

agent, good experience, nfsip, (12 more...)

arXiv.org Artificial Intelligence

2001.07993

Country: Asia > Singapore (0.04)

Genre: Research Report (0.50)

Industry: Law Enforcement & Public Safety > Fire & Emergency Services (0.89)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Neural Fictitious Self-Play on ELF Mini-RTS

Kawamura, Keigo, Tsuruoka, Yoshimasa

arXiv.org Machine LearningFeb-5-2019

Despite the notable successes in video games such as Atari 2600, current AI is yet to defeat human champions in the domain of real-time strategy (RTS) games. One of the reasons is that an RTS game is a multi-agent game, in which single-agent reinforcement learning methods cannot simply be applied because the environment is not a stationary Markov Decision Process. In this paper, we present a first step toward finding a game-theoretic solution to RTS games by applying Neural Fictitious Self-Play (NFSP), a game-theoretic approach for finding Nash equilibria, to Mini-RTS, a small but nontrivial RTS game provided on the ELF platform. More specifically, we show that NFSP can be effectively combined with policy gradient reinforcement learning and be applied to Mini-RTS. Experimental results also show that the scalability of NFSP can be substantially improved by pretraining the models with simple self-play using policy gradients, which by itself gives a strong strategy despite its lack of theoretical guarantee of convergence.

agent, nfsp, opponent, (16 more...)

arXiv.org Machine Learning

1902.02004

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback