AITopics | mahjong

Can Large Language Models Master Complex Card Games?

Neural Information Processing SystemsJun-15-2026, 16:00:57 GMT

Complex games have long been an important benchmark for testing the progress of artificial intelligence algorithms. AlphaGo, AlphaZero, and MuZero have defeated top human players in Go and Chess, garnering widespread societal attention towards artificial intelligence. Concurrently, large language models (LLMs) have exhibited remarkable capabilities across various tasks, raising the question of whether LLMs can achieve similar success in complex games. In this paper, we explore the potential of LLMs in mastering complex card games. We systematically assess the learning capabilities of LLMs across eight diverse card games, evaluating the impact of fine-tuning on high-quality gameplay data, and examining the models' ability to retain general capabilities while mastering these games. Our findings indicate that: (1) LLMs can approach the performance of strong game AIs through supervised fine-tuning on high-quality data, (2) LLMs can achieve a certain level of proficiency in multiple complex card games simultaneously, with performance augmentation for games with similar rules and conflicts for dissimilar ones, and (3) LLMs experience a decline in general capabilities when mastering complex games, but this decline can be mitigated by integrating a certain amount of general instruction data. The evaluation results demonstrate strong learning ability and versatility of LLMs. The code is available at https://github.com/THUDM/

information, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment > Games > Go (0.48)
Leisure & Entertainment > Games > Chess (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Can Large Language Models Master Complex Card Games?

Wang, Wei, Bie, Fuqing, Chen, Junzhe, Zhang, Dan, Huang, Shiyu, Kharlamov, Evgeny, Tang, Jie

arXiv.org Artificial IntelligenceOct-22-2025

Complex games have long been an important benchmark for testing the progress of artificial intelligence algorithms. AlphaGo, AlphaZero, and MuZero have defeated top human players in Go and Chess, garnering widespread societal attention towards artificial intelligence. Concurrently, large language models (LLMs) have exhibited remarkable capabilities across various tasks, raising the question of whether LLMs can achieve similar success in complex games. In this paper, we explore the potential of LLMs in mastering complex card games. We systematically assess the learning capabilities of LLMs across eight diverse card games, evaluating the impact of fine-tuning on high-quality gameplay data, and examining the models' ability to retain general capabilities while mastering these games. Our findings indicate that: (1) LLMs can approach the performance of strong game AIs through supervised fine-tuning on high-quality data, (2) LLMs can achieve a certain level of proficiency in multiple complex card games simultaneously, with performance augmentation for games with similar rules and conflicts for dissimilar ones, and (3) LLMs experience a decline in general capabilities when mastering complex games, but this decline can be mitigated by integrating a certain amount of general instruction data. The evaluation results demonstrate strong learning ability and versatility of LLMs. The code is available at https://github.com/THUDM/LLM4CardGame

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.01328

Country: Asia (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Games > Go (0.48)
Leisure & Entertainment > Games > Chess (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Style-Preserving Policy Optimization for Game Agents

Li, Lingfeng, Lu, Yunlong, Wang, Yongyi, Li, Wenxin

arXiv.org Artificial IntelligenceSep-23-2025

Proficient game agents with diverse play styles enrich the gaming experience and enhance the replay value of games. However, recent advancements in game AI based on reinforcement learning have predominantly focused on improving proficiency, whereas methods based on evolution algorithms generate agents with diverse play styles but exhibit subpar performance compared to RL methods. To address this gap, this paper proposes Mixed Proximal Policy Optimization (MPPO), a method designed to improve the proficiency of existing suboptimal agents while retaining their distinct styles. MPPO unifies loss objectives for both online and offline samples and introduces an implicit constraint to approximate demonstrator policies by adjusting the empirical distribution of samples. Empirical results across environments of varying scales demonstrate that MPPO achieves proficiency levels comparable to, or even superior to, pure online algorithms while preserving demonstrators' play styles. This work presents an effective approach for generating highly proficient and diverse game agents, ultimately contributing to more engaging gameplay experiences.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2506.16995

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

CFR-p: Counterfactual Regret Minimization with Hierarchical Policy Abstraction, and its Application to Two-player Mahjong

Wang, Shiheng

arXiv.org Artificial IntelligenceJul-22-2023

Counterfactual Regret Minimization(CFR) has shown its success in Texas Hold'em poker. We apply this algorithm to another popular incomplete information game, Mahjong. Compared to the poker game, Mahjong is much more complex with many variants. We study two-player Mahjong by conducting game theoretical analysis and making a hierarchical abstraction to CFR based on winning policies. This framework can be generalized to other imperfect information games.

artificial intelligence, information, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2307.12087

Country:

North America > United States > Texas (0.25)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Poker (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.47)

Add feedback

A Fast Algorithm for Computing the Deficiency Number of a Mahjong Hand

Yan, Xueqing, Li, Yongming, Li, Sanjiang

arXiv.org Artificial IntelligenceAug-15-2021

The tile-based multiplayer game Mahjong is widely played in Asia and has also become increasingly popular worldwide. Face-to-face or online, each player begins with a hand of 13 tiles and players draw and discard tiles in turn until they complete a winning hand. An important notion in Mahjong is the deficiency number (a.k.a. shanten number in Japanese Mahjong) of a hand, which estimates how many tile changes are necessary to complete the hand into a winning hand. The deficiency number plays an essential role in major decision-making tasks such as selecting a tile to discard. This paper proposes a fast algorithm for computing the deficiency number of a Mahjong hand. Compared with the baseline algorithm, the new algorithm is usually 100 times faster and, more importantly, respects the agent's knowledge about available tiles. The algorithm can be used as a basic procedure in all Mahjong variants by both rule-based and machine learning-based Mahjong AI.

algorithm, deficiency, knowledge base, (17 more...)

arXiv.org Artificial Intelligence

2108.06832

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Texas (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Microsoft's Mahjong-winning AI could lead to sophisticated finance market prediction systems

#artificialintelligenceJul-23-2020, 10:11:01 GMT

Last August, Microsoft Research Asia detailed an AI system dubbed Super Phoenix (Suphx for short) that could defeat Mahjong players after learning from only 5,000 matches. A revised preprint paper out this week delves a bit deeper, revealing that Suphx -- whose performance improved with additional training -- is now rated above 99.99% of all ranked human players on Tenhou, a Japan-based global online Mahjong competition platform with over 350,000 members. Building superhuman programs for games is a longstanding goal of the AI research community -- and not without good reason. Games are an analog of the real world, with a measurable objective, and they can be played an infinite amount of times across hundreds (or thousands) of powerful machines. Moreover, its researchers assert that the learnings are applicable to other domains, like the enterprise, where mundane but cognitively demanding tasks impact workers' productivity.

artificial intelligence, machine learning, suphx, (15 more...)

#artificialintelligence

Country: Asia > Japan (0.25)

Industry:

Leisure & Entertainment > Games (0.98)
Banking & Finance (0.78)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.99)

Add feedback

Last Week in AI

#artificialintelligenceApr-8-2020, 01:46:09 GMT

Every week, Invector Labs publishes a newsletter that covers the most recent developments in AI research and technology. You can find this week's issue below. You can sign up for it below. Games are often seen as a great benchmark to evaluate the ability of artificial intelligence(AI) algorithms to generalize knowledge. From the different data environments that we can create, games come the closest to resemble real world environments.

benchmark, blog post, reinforcement, (8 more...)

#artificialintelligence

Industry:

Leisure & Entertainment > Games > Computer Games (0.42)
Health & Medicine (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Suphx: Mastering Mahjong with Deep Reinforcement Learning

Li, Junjie, Koyamada, Sotetsu, Ye, Qiwei, Liu, Guoqing, Wang, Chao, Yang, Ruihan, Zhao, Li, Qin, Tao, Liu, Tie-Yan, Hon, Hsiao-Wuen

arXiv.org Artificial IntelligenceMar-31-2020

Artificial Intelligence (AI) has achieved great success in many domains, and game AI is widely regarded as its beachhead since the dawn of AI. In recent years, studies on game AI have gradually evolved from relatively simple environments (e.g., perfect-information games such as Go, chess, shogi or two-player imperfect-information games such as heads-up Texas hold'em) to more complex ones (e.g., multi-player imperfect-information games such as multi-player Texas hold'em and StartCraft II). Mahjong is a popular multi-player imperfect-information game worldwide but very challenging for AI research due to its complex playing/scoring rules and rich hidden information. We design an AI for Mahjong, named Suphx, based on deep reinforcement learning with some newly introduced techniques including global reward prediction, oracle guiding, and run-time policy adaptation. Suphx has demonstrated stronger performance than most top human players in terms of stable rank and is rated above 99.99% of all the officially ranked human players in the Tenhou platform. This is the first time that a computer program outperforms most top human players in Mahjong.

agent, mahjong, suphx, (17 more...)

arXiv.org Artificial Intelligence

2003.1359

Country:

North America > United States > Texas (0.44)
Asia > China (0.04)
Europe (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Building a Computer Mahjong Player via Deep Convolutional Neural Networks

Gao, Shiqi, Okuya, Fuminori, Kawahara, Yoshihiro, Tsuruoka, Yoshimasa

arXiv.org Artificial IntelligenceJun-7-2019

The evaluation function for imperfect information games is always hard to define but owns a significant impact on the playing strength of a program. Deep learning has made great achievements these years, and already exceeded the top human players' level even in the game of Go. In this paper, we introduce a new data model to represent the available imperfect information on the game table, and construct a well-designed convolutional neural network for game record training. We choose the accuracy of tile discarding which is also called as the agreement rate as the benchmark for this study. Our accuracy on test data reaches 70.44%, while the state-of-art baseline is 62.1% reported by Mizukami and Tsuruoka (2015), and is significantly higher than previous trials using deep learning, which shows the promising potential of our new model. For the AI program building, besides the tile discarding strategy, we adopt similar predicting strategies for other actions such as stealing (pon, chi, and kan) and riichi. With the simple combination of these several predicting networks and without any knowledge about the concrete rules of the game, a strength evaluation is made for the resulting program on the largest Japanese Mahjong site `Tenhou'. The program has achieved a rating of around 1850, which is significantly higher than that of an average human player and of programs among past studies.

artificial intelligence, information, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1906.02146

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.16)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Go (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Method for Constructing Artificial Intelligence Player with Abstraction to Markov Decision Processes in Multiplayer Game of Mahjong

Kurita, Moyuru, Hoki, Kunihito

arXiv.org Artificial IntelligenceApr-16-2019

We propose a method for constructing artificial intelligence (AI) of mahjong, which is a multiplayer imperfect information game. Since the size of the game tree is huge, constructing an expert-level AI player of mahjong is challenging. We define multiple Markov decision processes (MDPs) as abstractions of mahjong to construct effective search trees. We also introduce two methods of inferring state values of the original mahjong using these MDPs. We evaluated the effectiveness of our method using gameplays vis-\`{a}-vis the current strongest AI player.

ai player, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1904.07491

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology: