Goto

Collaborating Authors

 current player




Mastering Da Vinci Code: A Comparative Study of Transformer, LLM, and PPO-based Agents

Zhang, LeCheng, Wang, Yuanshi, Shen, Haotian, Wang, Xujie

arXiv.org Artificial Intelligence

The Da Vinci Code, a game of logical deduction and imperfect information, presents unique challenges for artificial intelligence, demanding nuanced reasoning beyond simple pattern recognition. This paper investigates the efficacy of various AI paradigms in mastering this game. We develop and evaluate three distinct agent architectures: a Transformer-based baseline model with limited historical context, several Large Language Model (LLM) agents (including Gemini, DeepSeek, and GPT variants) guided by structured prompts, and an agent based on Proximal Policy Optimization (PPO) employing a Transformer encoder for comprehensive game history processing. Performance is benchmarked against the baseline, with the PPO-based agent demonstrating superior win rates ($58.5\% \pm 1.0\%$), significantly outperforming the LLM counterparts. Our analysis highlights the strengths of deep reinforcement learning in policy refinement for complex deductive tasks, particularly in learning implicit strategies from self-play. We also examine the capabilities and inherent limitations of current LLMs in maintaining strict logical consistency and strategic depth over extended gameplay, despite sophisticated prompting. This study contributes to the broader understanding of AI in recreational games involving hidden information and multi-step logical reasoning, offering insights into effective agent design and the comparative advantages of different AI approaches.


Identify As A Human Does: A Pathfinder of Next-Generation Anti-Cheat Framework for First-Person Shooter Games

Zhang, Jiayi, Sun, Chenxin, Gu, Yue, Zhang, Qingyu, Lin, Jiayi, Du, Xiaojiang, Qian, Chenxiong

arXiv.org Artificial Intelligence

The gaming industry has experienced substantial growth, but cheating in online games poses a significant threat to the integrity of the gaming experience. Cheating, particularly in first-person shooter (FPS) games, can lead to substantial losses for the game industry. Existing anti-cheat solutions have limitations, such as client-side hardware constraints, security risks, server-side unreliable methods, and both-sides suffer from a lack of comprehensive real-world datasets. To address these limitations, the paper proposes HAWK, a server-side FPS anti-cheat framework for the popular game CS:GO. HAWK utilizes machine learning techniques to mimic human experts' identification process, leverages novel multi-view features, and it is equipped with a well-defined workflow. The authors evaluate HAWK with the first large and real-world datasets containing multiple cheat types and cheating sophistication, and it exhibits promising efficiency and acceptable overheads, shorter ban times compared to the in-use anti-cheat, a significant reduction in manual labor, and the ability to capture cheaters who evaded official inspections.


Efficient Learning in Chinese Checkers: Comparing Parameter Sharing in Multi-Agent Reinforcement Learning

Adhikari, Noah, Gu, Allen

arXiv.org Artificial Intelligence

We show that multi-agent reinforcement learning (MARL) with full parameter sharing outperforms independent and partially shared architectures in the competitive perfect-information homogenous game of Chinese Checkers. To run our experiments, we develop a new MARL environment: variable-size, six-player Chinese Checkers. This custom environment was developed in PettingZoo and supports all traditional rules of the game including chaining jumps. This is, to the best of our knowledge, the first implementation of Chinese Checkers that remains faithful to the true game. Chinese Checkers is difficult to learn due to its large branching factor and potentially infinite horizons. We borrow the concept of branching actions (submoves) from complex action spaces in other RL domains, where a submove may not end a player's turn immediately. This drastically reduces the dimensionality of the action space. Our observation space is inspired by AlphaGo with many binary game boards stacked in a 3D array to encode information. The PettingZoo environment, training and evaluation logic, and analysis scripts can be found on \href{https://github.com/noahadhikari/pettingzoo-chinese-checkers}{Github}.


Mastering the Game of Guandan with Deep Reinforcement Learning and Behavior Regulating

Yanggong, Yifan, Pan, Hao, Wang, Lei

arXiv.org Artificial Intelligence

Games are a simplified model of reality and often serve as a favored platform for Artificial Intelligence (AI) research. Much of the research is concerned with game-playing agents and their decision making processes. The game of Guandan (literally, "throwing eggs") is a challenging game where even professional human players struggle to make the right decision at times. In this paper we propose a framework named GuanZero for AI agents to master this game using Monte-Carlo methods and deep neural networks. The main contribution of this paper is about regulating agents' behavior through a carefully designed neural network encoding scheme. We then demonstrate the effectiveness of the proposed framework by comparing it with state-of-the-art approaches.


Vision Encoder-Decoder Models for AI Coaching

Nayak, Jyothi S, Khan, Afifah Khan Mohammed Ajmal, Manjeshwar, Chirag, Banday, Imadh Ajaz

arXiv.org Artificial Intelligence

This research paper introduces an innovative AI coaching approach by integrating vision-encoder-decoder models. The feasibility of this method is demonstrated using a Vision Transformer as the encoder and GPT-2 as the decoder, achieving a seamless integration of visual input and textual interaction. Departing from conventional practices of employing distinct models for image recognition and text-based coaching, our integrated architecture directly processes input images, enabling natural question-and-answer dialogues with the AI coach. This unique strategy simplifies model architecture while enhancing the overall user experience in human-AI interactions. We showcase sample results to demonstrate the capability of the model. The results underscore the methodology's potential as a promising paradigm for creating efficient AI coach models in various domains involving visual inputs. Importantly, this potential holds true regardless of the particular visual encoder or text decoder chosen. Additionally, we conducted experiments with different sizes of GPT-2 to assess the impact on AI coach performance, providing valuable insights into the scalability and versatility of our proposed methodology.


SAI: a Sensible Artificial Intelligence that plays with handicap and targets high scores in 9x9 Go (extended version)

Morandin, Francesco, Amato, Gianluca, Fantozzi, Marco, Gini, Rosa, Metta, Carlo, Parton, Maurizio

arXiv.org Artificial Intelligence

We develop a new model that can be applied to any perfect information two-player zero-sum game to target a high score, and thus a perfect play. We integrate this model into the Monte Carlo tree search-policy iteration learning pipeline introduced by Google DeepMind with AlphaGo. Training this model on 9x9 Go produces a superhuman Go player, thus proving that it is stable and robust. We show that this model can be used to effectively play with both positional and score handicap. We develop a family of agents that can target high scores against any opponent, and recover from very severe disadvantage against weak opponents. To the best of our knowledge, these are the first effective achievements in this direction.


AlphaGomoku: An AlphaGo-based Gomoku Artificial Intelligence using Curriculum Learning

Xie, Zheng, Fu, XingYu, Yu, JinYuan

arXiv.org Artificial Intelligence

Abstract--In this project, we combine AlphaGo algorithm with Curriculum Learning to crack the game of Gomoku. Modifications like Double Networks Mechanism and Winning Value Decay are implemented to solve the intrinsic asymmetry and short-sight of Gomoku. Our final AI AlphaGomoku, through two days' training on a single GPU, has reached humans' playing level. Free style Gomoku is an interesting strategy board game with quite simple rules: two players alternatively place black and white stones on a board with 15 by 15 grids and winner is the one who first reach a line of consecutive five or more stones of his or her color. It is popular among students since it can be played simply with a piece of paper and a pencil to kill the boring class time. It is also popular among computer scientists since Gomoku is a natural playground for many artificial intelligence algorithms.


SAI, a Sensible Artificial Intelligence that plays Go

Morandin, Francesco, Amato, Gianluca, Gini, Rosa, Metta, Carlo, Parton, Maurizio, Pascutto, Gian-Carlo

arXiv.org Artificial Intelligence

The longstanding challenge in artificial intelligence of playing Go at professional human level has been succesfully tackled in recent works [5, 7, 6], where software tools (AlphaGo, AlphaGo Zero, AlphaZero) combining neural networks and Monte Carlo tree search reached superhuman level. A recent development was Leela Zero [4], an open source software whose neural network is trained over millions of games played in a distributed fashion, thus allowing improvements within reach of the resources of the academic community. However, all these programs suffer from a relevant limitation: it is impossible to target their victory margin. They are trained with a fixed komi of 7.5 and they are built to maximize just the winning probability, not considering the score difference. This has several negative consequences for these programs: when they are ahead, they choose suboptimal moves, and often win by a small margin; they cannot be used with komi 6.5, which is also common in professional games; they show bad play in handicap games, since the winning probability is not a relevant attribute in that situations. In principle all these problems could be overcome by replacing the binary reward (win 1, lose 0) with the game score difference, but the latter is known to be less robust [3, 8] and in general strongest programs use the former since the seminal works [1, 3, 2].