Goto

Collaborating Authors

 minesweeper


Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning

arXiv.org Artificial Intelligence

The pursuit of artificial agents that can learn to master complex environments has led to remarkable successes, yet prevailing deep reinforcement learning methods often rely on immense experience, encoding their knowledge opaquely within neural network weights. We propose a different paradigm, one in which an agent learns to play by reasoning and planning. We introduce Cogito, ergo ludo (CEL), a novel agent architecture that leverages a Large Language Model (LLM) to build an explicit, language-based understanding of its environment's mechanics and its own strategy. Starting from a tabula rasa state with no prior knowledge (except action set), CEL operates on a cycle of interaction and reflection. After each episode, the agent analyzes its complete trajectory to perform two concurrent learning processes: Rule Induction, where it refines its explicit model of the environment's dynamics, and Strategy and Playbook Summarization, where it distills experiences into an actionable strategic playbook. We evaluate CEL on diverse grid-world tasks (i.e., Minesweeper, Frozen Lake, and Sokoban), and show that the CEL agent successfully learns to master these games by autonomously discovering their rules and developing effective policies from sparse rewards. Ablation studies confirm that the iterative process is critical for sustained learning. Our work demonstrates a path toward more general and interpretable agents that not only act effectively but also build a transparent and improving model of their world through explicit reasoning on raw experience.


Are Large Vision Language Models Good Game Players?

arXiv.org Artificial Intelligence

Large Vision Language Models (LVLMs) have demonstrated remarkable abilities in understanding and reasoning about both visual and textual information. However, existing evaluation methods for LVLMs, primarily based on benchmarks like Visual Question Answering and image captioning, often fail to capture the full scope of LVLMs' capabilities. These benchmarks are limited by issues such as inadequate assessment of detailed visual perception, data contamination, and a lack of focus on multi-turn reasoning. To address these challenges, we propose LVLM-Playground, a game-based evaluation framework designed to provide a comprehensive assessment of LVLMs' cognitive and reasoning skills in structured environments. LVLM-Playground uses a set of games to evaluate LVLMs on four core tasks: Perceiving, Question Answering, Rule Following, and End-to-End Playing, with each target task designed to assess specific abilities, including visual perception, reasoning, decision-making, etc.


Shtetl-Optimized » Blog Archive » My AI Safety Lecture for UT Effective Altruism

#artificialintelligence

Two weeks ago, I gave a lecture setting out my current thoughts on AI safety, halfway through my year at OpenAI. I was asked to speak by UT Austin's Effective Altruist club. You can watch the lecture on YouTube here (I recommend 2x speed). The timing turned out to be weird, coming immediately after the worst disaster to hit the Effective Altruist movement in its history, as I acknowledged in the talk. I then spent 20 minutes taking questions. For those who (like me) prefer text over video, below I've produced an edited transcript, by starting with YouTube's automated transcript and then, well, editing it. Thank you so much for inviting me here. I do feel a little bit sheepish to be lecturing you about AI safety, as someone who's worked on this subject for all of five months. But this past spring, I accepted an extremely interesting opportunity to go on leave for a year to think about what theoretical computer science can do for AI safety. I'm doing this at OpenAI, which is one of the world's leading AI startups, based in San Francisco although I'm mostly working from Austin. Despite its name, OpenAI is famously not 100% open … so there are certain topics that I'm not allowed to talk about, like the capabilities of the very latest systems and whether or not they'll blow people's minds when released. By contrast, OpenAI is very happy for me to talk about AI safety: what it is and and what if anything can we do about it. So what I thought I'd do is to tell you a little bit about the specific projects that I've been working on at OpenAI, but also just, as an admitted newcomer, share some general thoughts about AI safety and how Effective Altruists might want to think about it. I'll try to leave plenty of time for discussion. Maybe I should mention that the thoughts that I'll tell you today are ones that, until last week, I had considered writing up for an essay contest run by something called the FTX Future Fund. Unfortunately, the FTX Future Fund no longer exists. It was founded by someone named Sam Bankman-Fried, whose a net worth went from 15 billion dollars to some negative number of dollars in the space of two days, in one of the biggest financial scandals in memory. This is obviously a calamity for the EA community, which had been counting on funding from this individual. I feel terrible about all the projects left in the lurch, to say nothing of FTX's customers. Let's start with this: raise your hand if you've tried GPT-3.


Neural Network Learner for Minesweeper

arXiv.org Artificial Intelligence

Minesweeper is an interesting single player game based on logic, memory and guessing. Solving Minesweeper has been shown to be an NP-hard task. Deterministic solvers are the best known approach for solving Minesweeper. This project proposes a neural network based learner for solving Minesweeper. To choose the best learner, different architectures and configurations of neural networks were trained on hundreds of thousands of games. Surprisingly, the proposed neural network based learner has shown to be a very good approximation function for solving Minesweeper. The neural network learner competes well with the CSP solvers, especially in Beginner and Intermediate modes of the game. It was also observed that despite having high success rates, the best neural learner was considerably slower than the best deterministic solver. This report also discusses the overheads and limitations faced while creating highly successful neural networks for Minesweeper.


Gaming the Known and Unknown via Puzzle Solving With an Artificial Intelligence Agent

#artificialintelligence

Researchers design multiple strategies for an artificial intelligent (AI) agent to solve a stochastic puzzle like Minesweeper. For decades, efforts in solving games had been exclusive to solving two-player games (i.e., board games like checkers, chess-like games, etc.), where the game outcome can be correctly and efficiently predicted by applying some artificial intelligence (AI) search technique and collecting a massive amount of gameplay statistics. However, such a method and technique cannot be applied directly to the puzzle-solving domain since puzzles are generally played alone (single-player) and have unique characteristics (such as stochastic or hidden information). So then, a question arose as to how the AI technique can retain its performance for solving two-player games but instead applied to a single-agent puzzle? For years, puzzles and games had been regarded as interchangeable or one part of the other.


Fast constraint satisfaction problem and learning-based algorithm for solving Minesweeper

arXiv.org Artificial Intelligence

Minesweeper is a popular spatial-based decision-making game that works with incomplete information. As an exemplary NP-complete problem, it is a major area of research employing various artificial intelligence paradigms. The present work models this game as Constraint Satisfaction Problem (CSP) and Markov Decision Process (MDP). We propose a new method named as dependents from the independent set using deterministic solution search (DSScsp) for the faster enumeration of all solutions of a CSP based Minesweeper game and improve the results by introducing heuristics. Using MDP, we implement machine learning methods on these heuristics. We train the classification model on sparse data with results from CSP formulation. We also propose a new rewarding method for applying a modified deep Q-learning for better accuracy and versatile learning in the Minesweeper game. The overall results have been analyzed for different kinds of Minesweeper games and their accuracies have been recorded. Results from these experiments show that the proposed method of MDP based classification model and deep Q-learning overall is the best methods in terms of accuracy for games with given mine densities.


Reinforcement Learning For Constraint Satisfaction Game Agents (15-Puzzle, Minesweeper, 2048, and Sudoku)

arXiv.org Artificial Intelligence

In recent years, reinforcement learning has seen interest because of deep Q-Learning, where the model is a convolutional neural network. Deep Q-Learning has shown promising results in games such as Atari and AlphaGo. Instead of learning the entire Q-table, it learns an estimate of the Q function that determines a state's policy action. We use Q-Learning and deep Q-learning, to learn control policies of four constraint satisfaction games (15-Puzzle, Minesweeper, 2048, and Sudoku). 15-Puzzle is a sliding permutation puzzle and provides a challenge in addressing its large state space. Minesweeper and Sudoku involve partially observable states and guessing. 2048 is also a sliding puzzle but allows for easier state representation (compared to 15-Puzzle) and uses interesting reward shaping to solve the game. These games offer unique insights into the potential and limits of reinforcement learning. The Q agent is trained with no rules of the game, with only the reward corresponding to each state's action. Our unique contribution is in choosing the reward structure, state representation, and formulation of the deep neural network. For low shuffle, 15-Puzzle, achieves a 100% win rate, the medium and high shuffle achieve about 43% and 22% win rates respectively. On a standard 16x16 Minesweeper board, both low and high-density boards achieve close to 45% win rate, whereas medium density boards have a low win rate of 15%. For 2048, the 1024 win rate was achieved with significant ease (100%) with high win rates for 2048, 4096, 8192 and 16384 as 40%, 0.05%, 0.01% and 0.004% , respectively. The easy Sudoku games had a win rate of 7%, while medium and hard games had 2.1% and 1.2% win rates, respectively. This paper explores the environment complexity and behavior of a subset of constraint games using reward structures which can get us closer to understanding how humans learn.


A Phase Transition in Minesweeper

arXiv.org Artificial Intelligence

We study the average-case complexity of the classic Minesweeper game in which players deduce the locations of mines on a two-dimensional lattice. Playing Minesweeper is known to be co-NP-complete. We show empirically that Minesweeper exhibits a phase transition analogous to the well-studied SAT phase transition. Above the critical mine density it becomes almost impossible to play Minesweeper by logical inference. We use a reduction to Boolean unsatisfiability to characterize the hardness of Minesweeper instances, and show that the hardness peaks at the phase transition. Furthermore, we demonstrate algorithmic barriers at the phase transition for polynomial-time approaches to Minesweeper inference. Finally, we comment on expectations for the asymptotic behavior of the phase transition.


Multi-Armed Bandits for Minesweeper: Profiting from Exploration-Exploitation Synergy

arXiv.org Artificial Intelligence

A popular computer puzzle, the game of Minesweeper requires its human players to have a mix of both luck and strategy to succeed. Analyzing these aspects more formally, in our research we assessed the feasibility of a novel methodology based on Reinforcement Learning as an adequate approach to tackle the problem presented by this game. For this purpose we employed Multi-Armed Bandit algorithms which were carefully adapted in order to enable their use to define autonomous computational players, targeting to make the best use of some game peculiarities. After experimental evaluation, results showed that this approach was indeed successful, especially in smaller game boards, such as the standard beginner level. Despite this fact the main contribution of this work is a detailed examination of Minesweeper from a learning perspective, which led to various original insights which are thoroughly discussed.


Hyperbolic Minesweeper is in P

arXiv.org Artificial Intelligence

In (a), the default settings are used (bitruncated order-3 heptagonal tessellation, numbers of adjacent mines are color-coded; some of the mines that the players is sure of are marked red). In (b) we play on an order-3 heptagonal tessellation, and numbers are shown. Minesweeper is a popular game included with many computer systems; it also exists in the puzzle form. In the puzzle form, every cell in a square grid either contains a number or is empty.