pawn
Evaluating In Silico Creativity: An Expert Review of AI Chess Compositions
Veeriah, Vivek, Barbero, Federico, Chiam, Marcus, Feng, Xidong, Dennis, Michael, Pachauri, Ryan, Tumiel, Thomas, Obando-Ceron, Johan, Shi, Jiaxin, Hou, Shaobo, Singh, Satinder, Tomašev, Nenad, Zahavy, Tom
The rapid advancement of Generative AI has raised significant questions regarding its ability to produce creative and novel outputs. Our recent work investigates this question within the domain of chess puzzles and presents an AI system designed to generate puzzles characterized by aesthetic appeal, novelty, counter-intuitive and unique solutions. We briefly discuss our method below and refer the reader to the technical paper for more details. To assess our system's creativity, we presented a curated booklet of AI-generated puzzles to three world-renowned experts: International Master for chess compositions Amatzia Avni, Grandmaster Jonathan Levitt, and Grandmaster Matthew Sadler. All three are noted authors on chess aesthetics and the evolving role of computers in the game. They were asked to select their favorites and explain what made them appealing, considering qualities such as their creativity, level of challenge, or aesthetic design.
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Federal Workers Are Being Used as Pawns in the Shutdown
"People are scared, says one federal worker. "Is WIRED hiring?" jokes another. Federal workers have grown accustomed to a specific kind of dread over the past year . As of July, more than 150,000 federal workers had resigned from their roles since president Donald Trump took office for the second time, according to . Tens of thousands were also fired. For the past few months, it seemed like this bloodletting was over--but that all changed on Friday. Thousands of employees at eight government agencies were subjected to RIFs, or reductions in force--the government's formal process of laying off federal workers. According to a court filing from the Office of Management and Budget (OMB) on Friday, this latest round of firings has affected more than 4,000 federal employees. The court filing also claimed that the administration targeted the Treasury and the Department of Health and Human Services the hardest, hacking away at a combined 2,500 jobs across the two agencies and the entire Washington, DC, office of the Centers for Disease Control and Prevention . The Department of Education culled nearly its entire team handling special education, CNN reported on Tuesday . At the Environmental Protection Agency, the Department of Energy, and the Department of Housing and Urban Development, cuts ranged from a few dozen to several hundred jobs, according to the same filing. Who says their goal is to traumatize people?" says one IRS worker, referencing private speeches given by Russell Vought, the head of OMB and a key architect of the Heritage Foundation's Project 2025 who has been the public face of the job-cutting.
- North America > United States > District of Columbia > Washington (0.25)
- North America > United States > New York (0.05)
- North America > United States > Louisiana (0.05)
- (3 more...)
Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection
Miralles-González, Pablo, Huertas-Tato, Javier, Martín, Alejandro, Camacho, David
The rapid advancement in large language models (LLMs) has significantly enhanced their ability to generate coherent and contextually relevant text, raising concerns about the misuse of AI-generated content and making it critical to detect it. However, the task remains challenging, particularly in unseen domains or with unfamiliar LLMs. Leveraging LLM next-token distribution outputs offers a theoretically appealing approach for detection, as they encapsulate insights from the models' extensive pre-training on diverse corpora. Despite its promise, zero-shot methods that attempt to operationalize these outputs have met with limited success. We hypothesize that one of the problems is that they use the mean to aggregate next-token distribution metrics across tokens, when some tokens are naturally easier or harder to predict and should be weighted differently. Based on this idea, we propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length. Although not zero-shot, our method allows us to cache the last hidden states and next-token distribution metrics on disk, greatly reducing the training resource requirements. PAWN shows competitive and even better performance in-distribution than the strongest baselines (fine-tuned LMs) with a fraction of their trainable parameters. Our model also generalizes better to unseen domains and source models, with smaller variability in the decision boundary across distribution shifts. It is also more robust to adversarial attacks, and if the backbone has multilingual capabilities, it presents decent generalization to languages not seen during supervised training, with LLaMA3-1B reaching a mean macro-averaged F1 score of 81.46% in cross-validation with nine languages.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Spain > Galicia > Madrid (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- (19 more...)
- Information Technology > Security & Privacy (0.48)
- Media > News (0.46)
Estimating the number of reachable positions in Minishogi
Ishii, Sotaro, Tanaka, Tetsuro
To investigate the feasibility of strongly solving Minishogi (Gogo Shogi), it is necessary to know the number of its reachable positions from the initial position. However, there currently remains a significant gap between the lower and upper bounds of the value, since checking the legality of a Minishogi position is difficult. In this paper, the authors estimate the number of reachable positions by generating candidate positions using uniform random sampling and measuring the proportion of those reachable by a series of legal moves from the initial position. The experimental results reveal that the number of reachable Minishogi positions is approximately $2.38\times 10^{18}$.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- Europe > Netherlands > Limburg > Maastricht (0.04)
- Asia > Japan > Honshū > Tōhoku > Aomori Prefecture > Aomori (0.04)
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents
Costarelli, Anthony, Allen, Mat, Hauksson, Roman, Sodunke, Grace, Hariharan, Suhas, Cheng, Carlson, Li, Wenjie, Yadav, Arjun
Large language models have demonstrated remarkable few-shot performance on many natural language understanding tasks. Despite several demonstrations of using large language models in complex, strategic scenarios, there lacks a comprehensive framework for evaluating agents' performance across various types of reasoning found in games. To address this gap, we introduce GameBench, a cross-domain benchmark for evaluating strategic reasoning abilities of LLM agents. We focus on 9 different game environments, where each covers at least one axis of key reasoning skill identified in strategy games, and select games for which strategy explanations are unlikely to form a significant portion of models' pretraining corpuses. Our evaluations use GPT-3 and GPT-4 in their base form along with two scaffolding frameworks designed to enhance strategic reasoning ability: Chain-of-Thought (CoT) prompting and Reasoning Via Planning (RAP). Our results show that none of the tested models match human performance, and at worse GPT-4 performs worse than random action. CoT and RAP both improve scores but not comparable to human levels.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Texas (0.04)
Pushing Buttons: What makes Dragon's Dogma 2 a fiery breath of fresh air
I love when a game properly captures me, to the extent that I'm thinking about it throughout the day while going about my real life. It doesn't happen very often these days, because I have played too many games in the past 30 years and am becoming immune to their most common spells. When it does happen, it's usually because a game does something I haven't seen before – like Zelda: Tears of the Kingdom last year, with its madcap contraptions. Or sometimes – as with Dragon's Dogma 2, which I am very much still playing after reviewing it last week – it's because it does something I have seen before but not for a very long time. In the 12 years between the original Dragon's Dogma and this sequel, the only game that has come close to recapturing its chaotic and stubbornly idiosyncratic brand of fantasy action role-playing was Elden Ring.
A Game of Pawns
Avni, Guy, Ghorpade, Pranav, Guha, Shibashis
We introduce and study pawn games, a class of two-player zero-sum turn-based graph games. A turn-based graph game proceeds by placing a token on an initial vertex, and whoever controls the vertex on which the token is located, chooses its next location. This leads to a path in the graph, which determines the winner. Traditionally, the control of vertices is predetermined and fixed. The novelty of pawn games is that control of vertices changes dynamically throughout the game as follows. Each vertex of a pawn game is owned by a pawn. In each turn, the pawns are partitioned between the two players, and the player who controls the pawn that owns the vertex on which the token is located, chooses the next location of the token. Control of pawns changes dynamically throughout the game according to a fixed mechanism. Specifically, we define several grabbing-based mechanisms in which control of at most one pawn transfers at the end of each turn. We study the complexity of solving pawn games, where we focus on reachability objectives and parameterize the problem by the mechanism that is being used and by restrictions on pawn ownership of vertices. On the positive side, even though pawn games are exponentially-succinct turn-based games, we identify several natural classes that can be solved in PTIME. On the negative side, we identify several EXPTIME-complete classes, where our hardness proofs are based on a new class of games called Lock & Key games, which may be of independent interest.
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Asia > India > Tamil Nadu > Chennai (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (2 more...)
Learning to Play Chess from Textbooks (LEAP): a Corpus for Evaluating Chess Moves based on Sentiment Analysis
Alrdahi, Haifa, Batista-Navarro, Riza
Learning chess strategies has been investigated widely, with most studies focussing on learning from previous games using search algorithms. Chess textbooks encapsulate grandmaster knowledge, explain playing strategies and require a smaller search space compared to traditional chess agents. This paper examines chess textbooks as a new knowledge source for enabling machines to learn how to play chess -- a resource that has not been explored previously. We developed the LEAP corpus, a first and new heterogeneous dataset with structured (chess move notations and board states) and unstructured data (textual descriptions) collected from a chess textbook containing 1164 sentences discussing strategic moves from 91 games. We firstly labelled the sentences based on their relevance, i.e., whether they are discussing a move. Each relevant sentence was then labelled according to its sentiment towards the described move. We performed empirical experiments that assess the performance of various transformer-based baseline models for sentiment analysis. Our results demonstrate the feasibility of employing transformer-based sentiment analysis models for evaluating chess moves, with the best performing model obtaining a weighted micro F_1 score of 68%. Finally, we synthesised the LEAP corpus to create a larger dataset, which can be used as a solution to the limited textual resource in the chess domain.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom (0.14)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- (12 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
The Value of Chess Squares
Gupta, Aditya, Maharaj, Shiva, Polson, Nicholas, Sokolov, Vadim
We propose a neural network-based approach to calculate the value of a chess square-piece combination. Our model takes a triplet (Color, Piece, Square) as an input and calculates a value that measures the advantage/disadvantage of having this piece on this square. Our methods build on recent advances in chess AI, and can accurately assess the worth of positions in a game of chess. The conventional approach assigns fixed values to pieces $(\symking=\infty, \symqueen=9, \symrook=5, \symbishop=3, \symknight=3, \sympawn=1)$. We enhance this analysis by introducing marginal valuations. We use deep Q-learning to estimate the parameters of our model. We demonstrate our method by examining the positioning of Knights and Bishops, and also provide valuable insights into the valuation of pawns. Finally, we conclude by suggesting potential avenues for future research.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Improving Chess Commentaries by Combining Language Models with Symbolic Reasoning Engines
Lee, Andrew, Wu, David, Dinan, Emily, Lewis, Mike
Despite many recent advancements in language modeling, state-of-the-art language models lack grounding in the real world and struggle with tasks involving complex reasoning. Meanwhile, advances in the symbolic reasoning capabilities of AI have led to systems that outperform humans in games like chess and Go (Silver et al., 2018). Chess commentary provides an interesting domain for bridging these two fields of research, as it requires reasoning over a complex board state and providing analyses in natural language. In this work we demonstrate how to combine symbolic reasoning engines with controllable language models to generate chess commentaries. We conduct experiments to demonstrate that our approach generates commentaries that are preferred by human judges over previous baselines.
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Michigan (0.04)
- (4 more...)