mahjong
Can Large Language Models Master Complex Card Games?
Wang, Wei, Bie, Fuqing, Chen, Junzhe, Zhang, Dan, Huang, Shiyu, Kharlamov, Evgeny, Tang, Jie
Complex games have long been an important benchmark for testing the progress of artificial intelligence algorithms. AlphaGo, AlphaZero, and MuZero have defeated top human players in Go and Chess, garnering widespread societal attention towards artificial intelligence. Concurrently, large language models (LLMs) have exhibited remarkable capabilities across various tasks, raising the question of whether LLMs can achieve similar success in complex games. In this paper, we explore the potential of LLMs in mastering complex card games. We systematically assess the learning capabilities of LLMs across eight diverse card games, evaluating the impact of fine-tuning on high-quality gameplay data, and examining the models' ability to retain general capabilities while mastering these games. Our findings indicate that: (1) LLMs can approach the performance of strong game AIs through supervised fine-tuning on high-quality data, (2) LLMs can achieve a certain level of proficiency in multiple complex card games simultaneously, with performance augmentation for games with similar rules and conflicts for dissimilar ones, and (3) LLMs experience a decline in general capabilities when mastering complex games, but this decline can be mitigated by integrating a certain amount of general instruction data. The evaluation results demonstrate strong learning ability and versatility of LLMs. The code is available at https://github.com/THUDM/LLM4CardGame
Style-Preserving Policy Optimization for Game Agents
Li, Lingfeng, Lu, Yunlong, Wang, Yongyi, Li, Wenxin
Proficient game agents with diverse play styles enrich the gaming experience and enhance the replay value of games. However, recent advancements in game AI based on reinforcement learning have predominantly focused on improving proficiency, whereas methods based on evolution algorithms generate agents with diverse play styles but exhibit subpar performance compared to RL methods. To address this gap, this paper proposes Mixed Proximal Policy Optimization (MPPO), a method designed to improve the proficiency of existing suboptimal agents while retaining their distinct styles. MPPO unifies loss objectives for both online and offline samples and introduces an implicit constraint to approximate demonstrator policies by adjusting the empirical distribution of samples. Empirical results across environments of varying scales demonstrate that MPPO achieves proficiency levels comparable to, or even superior to, pure online algorithms while preserving demonstrators' play styles. This work presents an effective approach for generating highly proficient and diverse game agents, ultimately contributing to more engaging gameplay experiences.
CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong
Counterfactual Regret Minimization(CFR) has shown its success in Texas Hold'em poker. We apply this algorithm to another popular incomplete information game, Mahjong. Compared to the poker game, Mahjong is much more complex with many variants. We study two-player Mahjong by conducting game theoretical analysis and making a hierarchical abstraction to CFR based on winning policies. This framework can be generalized to other imperfect information games.
A Fast Algorithm for Computing the Deficiency Number of a Mahjong Hand
Yan, Xueqing, Li, Yongming, Li, Sanjiang
The tile-based multiplayer game Mahjong is widely played in Asia and has also become increasingly popular worldwide. Face-to-face or online, each player begins with a hand of 13 tiles and players draw and discard tiles in turn until they complete a winning hand. An important notion in Mahjong is the deficiency number (a.k.a. shanten number in Japanese Mahjong) of a hand, which estimates how many tile changes are necessary to complete the hand into a winning hand. The deficiency number plays an essential role in major decision-making tasks such as selecting a tile to discard. This paper proposes a fast algorithm for computing the deficiency number of a Mahjong hand. Compared with the baseline algorithm, the new algorithm is usually 100 times faster and, more importantly, respects the agent's knowledge about available tiles. The algorithm can be used as a basic procedure in all Mahjong variants by both rule-based and machine learning-based Mahjong AI.
Microsoft's Mahjong-winning AI could lead to sophisticated finance market prediction systems
Last August, Microsoft Research Asia detailed an AI system dubbed Super Phoenix (Suphx for short) that could defeat Mahjong players after learning from only 5,000 matches. A revised preprint paper out this week delves a bit deeper, revealing that Suphx -- whose performance improved with additional training -- is now rated above 99.99% of all ranked human players on Tenhou, a Japan-based global online Mahjong competition platform with over 350,000 members. Building superhuman programs for games is a longstanding goal of the AI research community -- and not without good reason. Games are an analog of the real world, with a measurable objective, and they can be played an infinite amount of times across hundreds (or thousands) of powerful machines. Moreover, its researchers assert that the learnings are applicable to other domains, like the enterprise, where mundane but cognitively demanding tasks impact workers' productivity.
Last Week in AI
Every week, Invector Labs publishes a newsletter that covers the most recent developments in AI research and technology. You can find this week's issue below. You can sign up for it below. Games are often seen as a great benchmark to evaluate the ability of artificial intelligence(AI) algorithms to generalize knowledge. From the different data environments that we can create, games come the closest to resemble real world environments.
Suphx: Mastering Mahjong with Deep Reinforcement Learning
Li, Junjie, Koyamada, Sotetsu, Ye, Qiwei, Liu, Guoqing, Wang, Chao, Yang, Ruihan, Zhao, Li, Qin, Tao, Liu, Tie-Yan, Hon, Hsiao-Wuen
Artificial Intelligence (AI) has achieved great success in many domains, and game AI is widely regarded as its beachhead since the dawn of AI. In recent years, studies on game AI have gradually evolved from relatively simple environments (e.g., perfect-information games such as Go, chess, shogi or two-player imperfect-information games such as heads-up Texas hold'em) to more complex ones (e.g., multi-player imperfect-information games such as multi-player Texas hold'em and StartCraft II). Mahjong is a popular multi-player imperfect-information game worldwide but very challenging for AI research due to its complex playing/scoring rules and rich hidden information. We design an AI for Mahjong, named Suphx, based on deep reinforcement learning with some newly introduced techniques including global reward prediction, oracle guiding, and run-time policy adaptation. Suphx has demonstrated stronger performance than most top human players in terms of stable rank and is rated above 99.99% of all the officially ranked human players in the Tenhou platform. This is the first time that a computer program outperforms most top human players in Mahjong.
Building a Computer Mahjong Player via Deep Convolutional Neural Networks
Gao, Shiqi, Okuya, Fuminori, Kawahara, Yoshihiro, Tsuruoka, Yoshimasa
The evaluation function for imperfect information games is always hard to define but owns a significant impact on the playing strength of a program. Deep learning has made great achievements these years, and already exceeded the top human players' level even in the game of Go. In this paper, we introduce a new data model to represent the available imperfect information on the game table, and construct a well-designed convolutional neural network for game record training. We choose the accuracy of tile discarding which is also called as the agreement rate as the benchmark for this study. Our accuracy on test data reaches 70.44%, while the state-of-art baseline is 62.1% reported by Mizukami and Tsuruoka (2015), and is significantly higher than previous trials using deep learning, which shows the promising potential of our new model. For the AI program building, besides the tile discarding strategy, we adopt similar predicting strategies for other actions such as stealing (pon, chi, and kan) and riichi. With the simple combination of these several predicting networks and without any knowledge about the concrete rules of the game, a strength evaluation is made for the resulting program on the largest Japanese Mahjong site `Tenhou'. The program has achieved a rating of around 1850, which is significantly higher than that of an average human player and of programs among past studies.
Method for Constructing Artificial Intelligence Player with Abstraction to Markov Decision Processes in Multiplayer Game of Mahjong
Kurita, Moyuru, Hoki, Kunihito
We propose a method for constructing artificial intelligence (AI) of mahjong, which is a multiplayer imperfect information game. Since the size of the game tree is huge, constructing an expert-level AI player of mahjong is challenging. We define multiple Markov decision processes (MDPs) as abstractions of mahjong to construct effective search trees. We also introduce two methods of inferring state values of the original mahjong using these MDPs. We evaluated the effectiveness of our method using gameplays vis-\`{a}-vis the current strongest AI player.
Let's Play Mahjong!
In the past decades, we have seen AI programs that can beat best human players in perfect information games including checker, chess and Go, where players know everything occurred in the game before making a decision. Imperfect information games are more challenging. Very recently, important progress has been made in solving the two-player heads-up limit Texas hold'em poker [2] and its no-limit version [3], which are the smallest variants of poker played competitively by humans. In this paper, we initiate a mathematical and AI study of the more popular and more complicated Mahjong game. Mahjong is a very popular tile-based multiplayer game played worldwide.