Goto

Collaborating Authors

 limit texas hold


Learning Strategy Representation for Imitation Learning in Multi-Agent Games

arXiv.org Artificial Intelligence

The offline datasets for imitation learning (IL) in multi-agent games typically contain player trajectories exhibiting diverse strategies, which necessitate measures to prevent learning algorithms from acquiring undesirable behaviors. Learning representations for these trajectories is an effective approach to depicting the strategies employed by each demonstrator. However, existing learning strategies often require player identification or rely on strong assumptions, which are not appropriate for multi-agent games. Therefore, in this paper, we introduce the Strategy Representation for Imitation Learning (STRIL) framework, which (1) effectively learns strategy representations in multi-agent games, (2) estimates proposed indicators based on these representations, and (3) filters out sub-optimal data using the indicators. STRIL is a plug-in method that can be integrated into existing IL algorithms. We demonstrate the effectiveness of STRIL across competitive multi-agent scenarios, including Two-player Pong, Limit Texas Hold'em, and Connect Four. Our approach successfully acquires strategy representations and indicators, thereby identifying dominant trajectories and significantly enhancing existing IL performance across these environments.


Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning

arXiv.org Artificial Intelligence

This paper introduces BI-Directional DEliberation Reasoning (BIDDER), a novel reasoning approach to enhance the decision rationality of language models. Traditional reasoning methods typically rely on historical information and employ uni-directional (left-to-right) reasoning strategy. This lack of bi-directional deliberation reasoning results in limited awareness of potential future outcomes and insufficient integration of historical context, leading to suboptimal decisions. BIDDER addresses this gap by incorporating principles of rational decision-making, specifically managing uncertainty and predicting expected utility. Our approach involves three key processes: Inferring hidden states to represent uncertain information in the decision-making process from historical data; Using these hidden states to predict future potential states and potential outcomes; Integrating historical information (past contexts) and long-term outcomes (future contexts) to inform reasoning. By leveraging bi-directional reasoning, BIDDER ensures thorough exploration of both past and future contexts, leading to more informed and rational decisions. We tested BIDDER's effectiveness in two well-defined scenarios: Poker (Limit Texas Hold'em) and Negotiation. Our experiments demonstrate that BIDDER significantly improves the decision-making capabilities of LLMs and LLM agents.


Approximate exploitability: Learning a best response in large games

arXiv.org Machine Learning

A standard metric used to measure the approximate optimality of policies in imperfect information games is exploitability, i.e. the performance of a policy against its worst-case opponent. However, exploitability is intractable to compute in large games as it requires a full traversal of the game tree to calculate a best response to the given policy. We introduce a new metric, approximate exploitability, that calculates an analogous metric using an approximate best response; the approximation is done by using search and reinforcement learning. This is a generalization of local best response, a domain specific evaluation metric used in poker. We provide empirical results for a specific instance of the method, demonstrating that our method converges to exploitability in the tabular and function approximation settings for small games. In large games, our method learns to exploit both strong and weak agents, learning to exploit an AlphaZero agent.


Successful Nash Equilibrium Agent for a 3-Player Imperfect-Information Game

arXiv.org Artificial Intelligence

Creating strong agents for games with more than two players is a major open problem in AI. Common approaches are based on approximating game-theoretic solution concepts such as Nash equilibrium, which have strong theoretical guarantees in two-player zero-sum games, but no guarantees in non-zero-sum games or in games with more than two players. We describe an agent that is able to defeat a variety of realistic opponents using an exact Nash equilibrium strategy in a 3-player imperfect-information game. This shows that, despite a lack of theoretical guarantees, agents based on Nash equilibrium strategies can be successful in multiplayer games after all.


How a poker-playing AI is learning to negotiate better than any human

#artificialintelligence

In 2012, a comic made its way around the internet listing games on a scale of how close they were to being dominated by artificial intelligence. Checkers and tic-tac-toe had already been conquered; chess's human champion had been dethroned, and IBM's Watson had taken no prisoners on Jeopardy. The "Computers may never outplay humans" section still had its stalwarts: Calvinball--the game in Bill Waterson's Calvin and Hobbes where the rules are made up on the fly--and Seven Minutes in Heaven. Just one step up, listed under "Computers still lose to top humans," were Chinese game Go and American pastime poker. Ph.D candidate Noam Brown is sitting next to a professional poker player closing out his 20th day of losing to Libratus, a poker-playing bot that Brown co-created at Carnegie Mellon University.


Evolving Adaptive Poker Players for Effective Opponent Exploitation

AAAI Conferences

In many imperfect information games, the ability to exploit the opponent is crucial for achieving high performance. For instance, skilled poker players usually capitalize on various weaknesses in their opponents’ playing patterns and styles to maximize their earnings. Therefore, it is important to enable computer players in such games to identify flaws in opponent strategies and adapt their behaviors to exploit these flaws. This paper presents a genetic algorithm to evolve adaptive LSTM (Long Short Term Memory) poker players featuring effective opponent exploitation. Experimental results in heads-up no-limit Texas Hold’em demonstrate that adaptive LSTM players are able to obtain 40% to 1360% more earnings than cutting-edge game theoretic poker players against opponents with various flawed strategies. In addition, experimental results indicate that adaptive LSTM players evolved through playing against simple and weak rule-based opponents can achieve comparable performance against top game-theoretic poker players. The approach introduced in this paper is a promising start for building adaptive computer players for imperfect information games.


Carnegie Mellon's AI crushing poker pros

#artificialintelligence

You're no match for Libratus, the new and powerful king of the felt. The artificial intelligence developed at Carnegie Mellon University is blowing away some of mankind's best in Heads-Up No Limit Texas Hold'Em, considered the final frontier of computer vs. human gamesmanship. Libratus and the pros – Dong Kim, Jimmy Chou, Jason Les and Daniel Mcauley – are playing a total of 120,000 hands over 20 days. With 101,908 hands in the bank, Libratus was ahead of all four by almost $1.4 million in virtual chips. Kim was down by $22,309; Mcauley, $271,233; Chou, $365,559; and Les, $718,341.


In a casino in Pittsburgh, an AI program is beating poker champions for the first time

#artificialintelligence

The night before his newest poker competition was set to begin, Carnegie Mellon's Tuomas Sandholm and his PhD student Noam Brown sat down to play a little No Limit Texas Hold'em against the main competition: the artificial intelligence program they designed called "Libratus." "I was totally wrecked," Sandholm told The Washington Post. But he is not a serious poker player, so that's not such a big achievement. For the past 13 days, however, Libratus has been facing off against four world-champion poker players in a Pittsburgh casino. If it can beat them like it beat Sandholm, it would be an enormous breakthrough.


Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games Using Convolutional Networks

AAAI Conferences

Poker is a family of card games that includes many varia- tions. We hypothesize that most poker games can be solved as a pattern matching problem, and propose creating a strong poker playing system based on a unified poker representa- tion. Our poker player learns through iterative self-play, and improves its understanding of the game by training on the results of its previous actions without sophisticated domain knowledge. We evaluate our system on three poker games: single player video poker, two-player Limit Texas Hold’em, and finally two-player 2-7 triple draw poker. We show that our model can quickly learn patterns in these very different poker games while it improves from zero knowledge to a competi- tive player against human experts. The contributions of this paper include: (1) a novel represen- tation for poker games, extendable to different poker vari- ations, (2) a Convolutional Neural Network (CNN) based learning model that can effectively learn the patterns in three different games, and (3) a self-trained system that signif- icantly beats the heuristic-based program on which it is trained, and our system is competitive against human expert players.


Decision Generalisation from Game Logs in No Limit Texas Hold'em

AAAI Conferences

Given a set of data, recorded by observing the decisions of an expert player, we present a case-based framework that allows the successful generalisation of those decisions in the game of no limit Texas Hold'em. We address the problems of determining a suitable action abstraction and the resulting state translation that is required to map real-value bet amounts into a discrete set of abstract actions. We also detail the similarity metrics used in order to identify similar scenarios, without which no generalisation of playing decisions would be possible. We show that we were able to successfully generalise no limit betting decisions from recorded data via our agent, SartreNL, which achieved a 5th place finish out of 11 opponents at the 2012 Annual Computer Poker Competition.