Goto

Collaborating Authors

 game type


Strategic Communication and Language Bias in Multi-Agent LLM Coordination

arXiv.org Artificial Intelligence

Large Language Model (LLM)-based agents are increasingly deployed in multi-agent scenarios where coordination is crucial but not always assured. Research shows that the way strategic scenarios are framed linguistically can affect cooperation. This paper explores whether allowing agents to communicate amplifies these language-driven effects. Leveraging FAIRGAME, we simulate one-shot and repeated games across different languages and models, both with and without communication. Our experiments, conducted with two advanced LLMs-GPT-4o and Llama 4 Maverick-reveal that communication significantly influences agent behavior, though its impact varies by language, personality, and game structure. These findings underscore the dual role of communication in fostering coordination and reinforcing biases.


Fast and the Furious: Hot Starts in Pursuit-Evasion Games

arXiv.org Artificial Intelligence

Effectively positioning pursuers in pursuit-evasion games without prior knowledge of evader locations remains a significant challenge. A novel approach that combines game-theoretic control theory with Graph Neural Networks is introduced in this work. By conceptualizing pursuer configurations as strategic arrangements and representing them as graphs, a Graph Characteristic Space is constructed via multi-objective optimization to identify Pareto-optimal configurations. A Graph Convolutional Network (GCN) is trained on these Pareto-optimal graphs to generate strategically effective initial configurations, termed "hot starts". Empirical evaluations demonstrate that the GCN-generated hot starts provide a significant advantage over random configurations. In scenarios considering multiple pursuers and evaders, this method hastens the decline in evader survival rates, reduces pursuer travel distances, and enhances containment, showcasing clear strategic benefits.


Who is a Better Player: LLM against LLM

arXiv.org Artificial Intelligence

Adversarial board games, as a paradigmatic domain of strategic reasoning and intelligence, have long served as both a popular competitive activity and a benchmark for evaluating artificial intelligence (AI) systems. Building on this foundation, we propose an adversarial benchmarking framework to assess the comprehensive performance of Large Language Models (LLMs) through board games competition, compensating the limitation of data dependency of the mainstream Question-and-Answer (Q&A) based benchmark method. We introduce Qi Town, a specialized evaluation platform that supports 5 widely played games and involves 20 LLM-driven players. The platform employs both the Elo rating system and a novel Performance Loop Graph (PLG) to quantitatively evaluate the technical capabilities of LLMs, while also capturing Positive Sentiment Score (PSS) throughout gameplay to assess mental fitness. The evaluation is structured as a round-robin tournament, enabling systematic comparison across players. Experimental results indicate that, despite technical differences, most LLMs remain optimistic about winning and losing, demonstrating greater adaptability to high-stress adversarial environments than humans. On the other hand, the complex relationship between cyclic wins and losses in PLGs exposes the instability of LLMs' skill play during games, warranting further explanation and exploration.


Tokyo's Best Video Game Arcades in Akihabara: Where to Go, What to Do

WIRED

As we stumble out of our third Japanese arcade and onto the glowing sidewalks of Tokyo's Akihabara ward, I catch eyes with my partner and laugh. In this perfect moment together, we're equally content but exhausted from a day of vigorous gaming. Holding hands as we squat outside on a slender metal railing and chug melon sodas from a nearby vending machine, I promise the next game center will be our last stop in today's marathon. Popular in Japan since the 1970s, video game arcades took a major hit during the coronavirus pandemic. Strict limitations on public gatherings led to businesses shutting down.


Lost in the Logic: An Evaluation of Large Language Models' Reasoning Capabilities on LSAT Logic Games

arXiv.org Artificial Intelligence

In this thesis, I evaluate the performance of Large Language Models (LLMs) on the Law School Admissions Test (LSAT), specifically the Logic Games section of the test. I focus on this section because it presents a complex logical reasoning task and thus is a valuable source of data for evaluating how modern, increasingly capable LLMs can handle hard logical reasoning tasks. I construct a dataset of LSAT logic games and their associated metadata, and extensively evaluate LLMs' performance in a Chain-of-Thought prompting setting. Given the weak performance in this setting, I explore other prompting frameworks on a smaller subset of the dataset, adapting ideas from Reflexion to this task. This results in a substantially improved accuracy of 70 percent for GPT-4 and 46 percent for GPT-3.5 on this data subset, highlighting the capacity of LLMs to revise their logical errors, despite initially weak performance. Finally, I analyze the types of logic games that models perform better or worse on, as well as the types of logical errors I observe from human annotation, providing detailed insights on the logical reasoning capabilities of LLMs.


On the recognition of the game type based on physiological signals and eye tracking

arXiv.org Artificial Intelligence

Automated interpretation of signals yields many impressive applications from the area of affective computing and human activity recognition (HAR). In this paper we ask the question about possibility of cognitive activity recognition on the base of particular set of signals. We use recognition of the game played by the participant as a playground for exploration of the problem. We build classifier of three different games (Space Invaders, Tetris, Tower Defence) and inter-game pause. We validate classifier in the player-independent and player-dependent scenario. We discuss the improvement in the player-dependent scenario in the context of biometric person recognition. On the base of the results obtained in game classification, we consider potential applications in smart surveillance and quantified self.


DeepMind details OpenSpiel, a collection of AI training tools for video games

#artificialintelligence

Reinforcement learning, the AI training technique that's brought to fruition systems capable of defeating world poker champions and guiding self-driving cars, isn't the simplest thing in the world to wrangle. That's particularly true in the gaming domain, where cutting-edge approaches sometimes require bespoke tools that aren't publicly available. In a paper recently published on the preprint server Arxiv.org, At its core, it's a collection of environments and algorithms for research in general reinforcement learning and search and planning in games, with tools to analyze learning dynamics and other common evaluation metrics. "The purpose of OpenSpiel is to promote general multiagent reinforcement learning across many different game types, in a similar way as general game-playing but with a heavy emphasis on learning and not in competition form," wrote the researchers.


Policy Based Inference in Trick-Taking Card Games

arXiv.org Artificial Intelligence

Trick-taking card games feature a large amount of private information that slowly gets revealed through a long sequence of actions. This makes the number of histories exponentially large in the action sequence length, as well as creating extremely large information sets. As a result, these games become too large to solve. To deal with these issues many algorithms employ inference, the estimation of the probability of states within an information set. In this paper, we demonstrate a Policy Based Inference (PI) algorithm that uses player modelling to infer the probability we are in a given state. We perform experiments in the German trick-taking card game Skat, in which we show that this method vastly improves the inference as compared to previous work, and increases the performance of the state-of-the-art Skat AI system Kermit when it is employed into its determinized search algorithm.


Learning Policies from Human Data for Skat

arXiv.org Artificial Intelligence

Decision-making in large imperfect information games is difficult. Thanks to recent success in Poker, Counterfactual Regret Minimization (CFR) methods have been at the forefront of research in these games. However, most of the success in large games comes with the use of a forward model and powerful state abstractions. In trick-taking card games like Bridge or Skat, large information sets and an inability to advance the simulation without fully determinizing the state make forward search problematic. Furthermore, state abstractions can be especially difficult to construct because the precise holdings of each player directly impact move values. In this paper we explore learning model-free policies for Skat from human game data using deep neural networks (DNN). We produce a new state-of-the-art system for bidding and game declaration by introducing methods to a) directly vary the aggressiveness of the bidder and b) declare games based on expected value while mitigating issues with rarely observed state-action pairs. Although cardplay policies learned through imitation are slightly weaker than the current best search-based method, they run orders of magnitude faster. We also explore how these policies could be learned directly from experience in a reinforcement learning setting and discuss the value of incorporating human data for this task.


Improving Search with Supervised Learning in Trick-Based Card Games

arXiv.org Artificial Intelligence

In trick-taking card games, a two-step process of state sampling and evaluation is widely used to approximate move values. While the evaluation component is vital, the accuracy of move value estimates is also fundamentally linked to how well the sampling distribution corresponds the true distribution. Despite this, recent work in trick-taking card game AI has mainly focused on improving evaluation algorithms with limited work on improving sampling. In this paper, we focus on the effect of sampling on the strength of a player and propose a novel method of sampling more realistic states given move history. In particular, we use predictions about locations of individual cards made by a deep neural network --- trained on data from human gameplay - in order to sample likely worlds for evaluation. This technique, used in conjunction with Perfect Information Monte Carlo (PIMC) search, provides a substantial increase in cardplay strength in the popular trick-taking card game of Skat.