avalon
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
912d2b1c7b2826caf99687388d2e8f7c-AuthorFeedback.pdf
I think the author needs to argue why Avalon is a better agent for real-world hidden role scenarios than other5 games? Among hidden role games, Avalon is one of the most popular and widely played games (according to6 boardgamegeek.com) Inthe real world, there are7 subtle cues which can often be misinterpreted when others are acting under uncertainty. AlphaGo-likemethods can be used when12 the board state alone is sufficient to determine the best move, but in imperfect information games it is necessary to13 consider how players acted to reach the current board state. We've added the following sentence: "Compared to14 MCTS-based methods like AlphaGo, CFR-based methods like DeepStack and DeepRole can soundly reason over15 hiddeninformation."16
Finding Friend and Foe in Multi-Agent Games
Recent breakthroughs in AI for multi-agent games like Go, Poker, and Dota, have seen great strides in recent years. Yet none of these games address the real-life challenge of cooperation in the presence of unknown and uncertain teammates. This challenge is a key game mechanism in hidden role games. Here we develop the DeepRole algorithm, a multi-agent reinforcement learning agent that we test on The Resistance: Avalon, the most popular hidden role game. DeepRole combines counterfactual regret minimization (CFR) with deep value networks trained through self-play.
Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds
Despite impressive successes, deep reinforcement learning (RL) systems still fall short of human performance on generalization to new tasks and environments that differ from their training. As a benchmark tailored for studying RL generalization, we introduce Avalon, a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gathering food, and avoiding hazards. Avalon is unique among existing RL benchmarks in that the reward function, world dynamics, and action space are the same for every task, with tasks differentiated solely by altering the environment; its 20 tasks, ranging in complexity from eat and throw to hunt and navigate, each create worlds in which the agent must perform specific skills in order to survive. This setup enables investigations of generalization within tasks, between tasks, and to compositional tasks that require combining skills learned from previous tasks. Avalon includes a highly efficient simulator, a library of baselines, and a benchmark with scoring metrics evaluated against hundreds of hours of human performance, all of which are open-source and publicly available. We find that standard RL baselines make progress on most tasks but are still far from human performance, suggesting Avalon is challenging enough to advance the quest for generalizable RL.
CSP4SDG: Constraint and Information-Theory Based Role Identification in Social Deduction Games with LLM-Enhanced Inference
Xu, Kaijie, Meng, Fandi, Verbrugge, Clark, Lucas, Simon
In Social Deduction Games (SDGs) such as Avalon, Mafia, and W erewolf, players conceal their identities and deliberately mislead others, making hidden-role inference a central and demanding task. Accurate role identification, which forms the basis of an agent's belief state, is therefore the keystone for both human and AI performance. We introduce CSP4SDG, a probabilistic, constraint-satisfaction framework that analyses gameplay objectively. Game events and dialogue are mapped to four linguistically-agnostic constraint classes--evidence, phenomena, assertions, and hypotheses. Hard constraints prune impossible role assignments, while weighted soft constraints score the remainder; information-gain weighting links each hypothesis to its expected value under entropy reduction, and a simple closed-form scoring rule guarantees that truthful assertions converge to classical hard logic with minimum error. The resulting posterior over roles is fully interpretable and updates in real time. Experiments on three public datasets show that CSP4SDG (i) outperforms LLM-based baselines in every inference scenario, and (ii) boosts LLMs when supplied as an auxiliary "reasoning tool." Our study validates that principled probabilistic reasoning with information theory is a scalable alternative--or complement--to heavy-weight neural models for SDGs.
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > Hawaii (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Texas (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Education (0.68)
- Leisure & Entertainment > Games > Computer Games (0.47)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
- Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds
Despite impressive successes, deep reinforcement learning (RL) systems still fall short of human performance on generalization to new tasks and environments that differ from their training. As a benchmark tailored for studying RL generalization, we introduce Avalon, a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gathering food, and avoiding hazards. Avalon is unique among existing RL benchmarks in that the reward function, world dynamics, and action space are the same for every task, with tasks differentiated solely by altering the environment; its 20 tasks, ranging in complexity from eat and throw to hunt and navigate, each create worlds in which the agent must perform specific skills in order to survive. This setup enables investigations of generalization within tasks, between tasks, and to compositional tasks that require combining skills learned from previous tasks. Avalon includes a highly efficient simulator, a library of baselines, and a benchmark with scoring metrics evaluated against hundreds of hours of human performance, all of which are open-source and publicly available.