Goto

Collaborating Authors

 texas hold


No-Regret Strategy Solving in Imperfect-Information Games via Pre-Trained Embedding

Fu, Yanchang, Liu, Shengda, Xu, Pei, Huang, Kaiqi

arXiv.org Artificial Intelligence

High-quality information set abstraction remains a core challenge in solving large-scale imperfect-information extensive-form games (IIEFGs)--such as no-limit Texas Hold'em--where the finite nature of spatial resources hinders solving strategies for the full game. State-of-the-art AI methods rely on pre-trained discrete clustering for abstraction, yet their hard classification irreversibly discards critical information: specifically, the quantifiable subtle differences between information sets--vital for strategy solving--thus compromising the quality of such solving. Inspired by the word embedding paradigm in natural language processing, this paper proposes the Embedding CFR algorithm, a novel approach for solving strategies in IIEFGs within an embedding space. The algorithm pre-trains and embeds the features of individual information sets into an interconnected low-dimensional continuous space, where the resulting vectors more precisely capture both the distinctions and connections between information sets. Embedding CFR introduces a strategy-solving process driven by regret accumulation and strategy updates in this embedding space, with supporting theoretical analysis verifying its ability to reduce cumulative regret. Experiments on poker show that with the same spatial overhead, Embedding CFR achieves significantly faster exploitability convergence compared to cluster-based abstraction algorithms, confirming its effectiveness. Furthermore, to our knowledge, it is the first algorithm in poker AI that pre-trains information set abstractions via low-dimensional embedding for strategy solving.


How Hacked Card Shufflers Allegedly Enabled a Mob-Fueled Poker Scam That Rocked the NBA

WIRED

WIRED recently demonstrated how to cheat at poker by hacking the Deckmate 2 card shufflers used in casinos. The mob was allegedly using the same trick to fleece victims for millions. Security researcher Joseph Tartaro demonstrates how he can insert a hacking device into a USB on the back of the shuffler that alters its code, then transmits the deck's order via Bluetooth to a phone app. The Deckmate 2 automatic card shufflers used in casinos, cardhouses, and high-end private poker games around the world are designed to shuffle a deck in seconds with perfect, computer-generated randomness, vastly speeding up play. They're also, amazingly, sold with a camera inside that can observe every card in the deck before it's dealt--a fact that's become very convenient for poker-cheating hackers and, allegedly, members of the Cosa Nostra mafia.


Can Large Language Models Master Complex Card Games?

Wang, Wei, Bie, Fuqing, Chen, Junzhe, Zhang, Dan, Huang, Shiyu, Kharlamov, Evgeny, Tang, Jie

arXiv.org Artificial Intelligence

Complex games have long been an important benchmark for testing the progress of artificial intelligence algorithms. AlphaGo, AlphaZero, and MuZero have defeated top human players in Go and Chess, garnering widespread societal attention towards artificial intelligence. Concurrently, large language models (LLMs) have exhibited remarkable capabilities across various tasks, raising the question of whether LLMs can achieve similar success in complex games. In this paper, we explore the potential of LLMs in mastering complex card games. We systematically assess the learning capabilities of LLMs across eight diverse card games, evaluating the impact of fine-tuning on high-quality gameplay data, and examining the models' ability to retain general capabilities while mastering these games. Our findings indicate that: (1) LLMs can approach the performance of strong game AIs through supervised fine-tuning on high-quality data, (2) LLMs can achieve a certain level of proficiency in multiple complex card games simultaneously, with performance augmentation for games with similar rules and conflicts for dissimilar ones, and (3) LLMs experience a decline in general capabilities when mastering complex games, but this decline can be mitigated by integrating a certain amount of general instruction data. The evaluation results demonstrate strong learning ability and versatility of LLMs. The code is available at https://github.com/THUDM/LLM4CardGame


Preference-CFR$\:$ Beyond Nash Equilibrium for Better Game Strategies

Ju, Qi, Tellier, Thomas, Sun, Meng, Fang, Zhemei, Luo, Yunfeng

arXiv.org Artificial Intelligence

Recent advancements in artificial intelligence (AI) have leveraged large-scale games as benchmarks to gauge progress, with AI now frequently outperforming human capabilities. Traditionally, this success has largely relied on solving Nash equilibrium (NE) using variations of the counterfactual regret minimization (CFR) method in games with incomplete information. However, the variety of Nash equilibria has been largely overlooked in previous research, limiting the adaptability of AI to meet diverse human preferences. To address this challenge, where AI is powerful but struggles to meet customization needs, we introduce a novel approach: Preference-CFR, which incorporates two new parameters: preference degree and vulnerability degree. These parameters allow for greater flexibility in AI strategy development without compromising convergence. Our method significantly alters the distribution of final strategies, enabling the creation of customized AI models that better align with individual user needs. Using Texas Hold'em as a case study, our experiments demonstrate how Preference CFR can be adjusted to either emphasize customization, prioritizing user preferences, or to enhance performance, striking a balance between the depth of customization and strategic optimality.


Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning

Zhang, Yadong, Mao, Shaoguang, Wu, Wenshan, Xia, Yan, Ge, Tao, Lan, Man, Wei, Furu

arXiv.org Artificial Intelligence

This paper introduces BI-Directional DEliberation Reasoning (BIDDER), a novel reasoning approach to enhance the decision rationality of language models. Traditional reasoning methods typically rely on historical information and employ uni-directional (left-to-right) reasoning strategy. This lack of bi-directional deliberation reasoning results in limited awareness of potential future outcomes and insufficient integration of historical context, leading to suboptimal decisions. BIDDER addresses this gap by incorporating principles of rational decision-making, specifically managing uncertainty and predicting expected utility. Our approach involves three key processes: Inferring hidden states to represent uncertain information in the decision-making process from historical data; Using these hidden states to predict future potential states and potential outcomes; Integrating historical information (past contexts) and long-term outcomes (future contexts) to inform reasoning. By leveraging bi-directional reasoning, BIDDER ensures thorough exploration of both past and future contexts, leading to more informed and rational decisions. We tested BIDDER's effectiveness in two well-defined scenarios: Poker (Limit Texas Hold'em) and Negotiation. Our experiments demonstrate that BIDDER significantly improves the decision-making capabilities of LLMs and LLM agents.


Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization

Zhang, Wenqi, Tang, Ke, Wu, Hai, Wang, Mengna, Shen, Yongliang, Hou, Guiyang, Tan, Zeqi, Li, Peng, Zhuang, Yueting, Lu, Weiming

arXiv.org Artificial Intelligence

Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks. However, most LLM-based agents are designed as specific task solvers with sophisticated prompt engineering, rather than agents capable of learning and evolving through interactions. These task solvers necessitate manually crafted prompts to inform task rules and regulate LLM behaviors, inherently incapacitating to address complex dynamic scenarios e.g., large interactive games. In light of this, we propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization that can learn a wealth of expertise from interactive experiences and progressively elevate its behavioral policy. Specifically, it involves a dynamic belief generation and reflection process for policy evolution. Rather than action-level reflection, Agent-Pro iteratively reflects on past trajectories and beliefs, fine-tuning its irrational beliefs for a better policy. Moreover, a depth-first search is employed for policy optimization, ensuring continual enhancement in policy payoffs. Agent-Pro is evaluated across two games: Blackjack and Texas Hold'em, outperforming vanilla LLM and specialized models. Our results show Agent-Pro can learn and evolve in complex and dynamic scenes, which also benefits numerous LLM-based applications.


Instruction-Driven Game Engines on Large Language Models

Wu, Hongqiu, Wang, Y., Liu, Xingyuan, Zhao, Hai, Zhang, Min

arXiv.org Artificial Intelligence

The Instruction-Driven Game Engine (IDGE) project aims to democratize game development by enabling a large language model (LLM) to follow free-form game rules and autonomously generate game-play processes. The IDGE allows users to create games by issuing simple natural language instructions, which significantly lowers the barrier for game development. We approach the learning process for IDGEs as a Next State Prediction task, wherein the model autoregressively predicts in-game states given player actions. It is a challenging task because the computation of in-game states must be precise; otherwise, slight errors could disrupt the game-play. To address this, we train the IDGE in a curriculum manner that progressively increases the model's exposure to complex scenarios. Our initial progress lies in developing an IDGE for Poker, a universally cherished card game. The engine we've designed not only supports a wide range of poker variants but also allows for high customization of rules through natural language inputs. Furthermore, it also favors rapid prototyping of new games from minimal samples, proposing an innovative paradigm in game development that relies on minimal prompt and data engineering. This work lays the groundwork for future advancements in instruction-driven game creation, potentially transforming how games are designed and played.


ELA: Exploited Level Augmentation for Offline Learning in Zero-Sum Games

Lei, Shiqi, Lee, Kanghoon, Li, Linjing, Park, Jinkyoo, Li, Jiachen

arXiv.org Artificial Intelligence

Offline learning has become widely used due to its ability to derive effective policies from offline datasets gathered by expert demonstrators without interacting with the environment directly. Recent research has explored various ways to enhance offline learning efficiency by considering the characteristics (e.g., expertise level or multiple demonstrators) of the dataset. However, a different approach is necessary in the context of zero-sum games, where outcomes vary significantly based on the strategy of the opponent. In this study, we introduce a novel approach that uses unsupervised learning techniques to estimate the exploited level of each trajectory from the offline dataset of zero-sum games made by diverse demonstrators. Subsequently, we incorporate the estimated exploited level into the offline learning to maximize the influence of the dominant strategy. Our method enables interpretable exploited level estimation in multiple zero-sum games and effectively identifies dominant strategy data. Also, our exploited level augmented offline learning significantly enhances the original offline learning algorithms including imitation learning and offline reinforcement learning for zero-sum games.


LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments

Chen, Junzhe, Hu, Xuming, Liu, Shuodi, Huang, Shiyu, Tu, Wei-Wei, He, Zhaofeng, Wen, Lijie

arXiv.org Artificial Intelligence

Recent advancements in large language models (LLMs) have revealed their potential for achieving autonomous agents possessing human-level intelligence. However, existing benchmarks for evaluating LLM Agents either use static datasets, potentially leading to data leakage or focus only on single-agent scenarios, overlooking the complexities of multi-agent interactions. There is a lack of a benchmark that evaluates the diverse capabilities of LLM agents in multi-agent, dynamic environments. To this end, we introduce LLMArena, a novel and easily extensible framework for evaluating the diverse capabilities of LLM in multi-agent dynamic environments. LLMArena encompasses seven distinct gaming environments, employing Trueskill scoring to assess crucial abilities in LLM agents, including spatial reasoning, strategic planning, numerical reasoning, risk assessment, communication, opponent modeling, and team collaboration. We conduct an extensive experiment and human evaluation among different sizes and types of LLMs, showing that LLMs still have a significant journey ahead in their development towards becoming fully autonomous agents, especially in opponent modeling and team collaboration. We hope LLMArena could guide future research towards enhancing these capabilities in LLMs, ultimately leading to more sophisticated and practical applications in dynamic, multi-agent settings. The code and data will be available.


A Survey on Game Theory Optimal Poker

Sonawane, Prathamesh, Chheda, Arav

arXiv.org Artificial Intelligence

Poker is in the family of imperfect information games unlike other games such as chess, connect four, etc which are perfect information game instead. While many perfect information games have been solved, no non-trivial imperfect information game has been solved to date. This makes poker a great test bed for Artificial Intelligence research. In this paper we firstly compare Game theory optimal poker to Exploitative poker. Secondly, we discuss the intricacies of abstraction techniques, betting models, and specific strategies employed by successful poker bots like Tartanian[1] and Pluribus[6]. Thirdly, we also explore 2-player vs multi-player games and the limitations that come when playing with more players. Finally, this paper discusses the role of machine learning and theoretical approaches in developing winning strategies and suggests future directions for this rapidly evolving field.