Goto

Collaborating Authors

 wordle


Playpen: An Environment for Exploring Learning Through Conversational Interaction

Horst, Nicola, Mazzaccara, Davide, Schmidt, Antonia, Sullivan, Michael, Momentè, Filippo, Franceschetti, Luca, Sadler, Philipp, Hakimov, Sherzod, Testoni, Alberto, Bernardi, Raffaella, Fernández, Raquel, Koller, Alexander, Lemon, Oliver, Schlangen, David, Giulianelli, Mario, Suglia, Alessandro

arXiv.org Artificial Intelligence

Interaction between learner and feedback-giver has come into focus recently for post-training of Large Language Models (LLMs), through the use of reward models that judge the appropriateness of a model's response. In this paper, we investigate whether Dialogue Games -- goal-directed and rule-governed activities driven predominantly by verbal actions -- can also serve as a source of feedback signals for learning. We introduce Playpen, an environment for off- and online learning through Dialogue Game self-play, and investigate a representative set of post-training methods: supervised fine-tuning; direct alignment (DPO); and reinforcement learning with GRPO. We experiment with post-training a small LLM (Llama-3.1-8B-Instruct), evaluating performance on unseen instances of training games as well as unseen games, and on standard benchmarks. We find that imitation learning through SFT improves performance on unseen instances, but negatively impacts other skills, while interactive learning with GRPO shows balanced improvements without loss of skills. We release the framework and the baseline training setups to foster research in the promising new direction of learning in (synthetic) interaction.


AGI Is Coming... Right After AI Learns to Play Wordle

Shekkizhar, Sarath, Cosentino, Romain

arXiv.org Artificial Intelligence

This paper investigates multimodal agents, in particular, OpenAI's Computer-User Agent (CUA), trained to control and complete tasks through a standard computer interface, similar to humans. We evaluated the agent's performance on the New York Times Wordle game to elicit model behaviors and identify shortcomings. Our findings revealed a significant discrepancy in the model's ability to recognize colors correctly depending on the context. The model had a $5.36\%$ success rate over several hundred runs across a week of Wordle. Despite the immense enthusiasm surrounding AI agents and their potential to usher in Artificial General Intelligence (AGI), our findings reinforce the fact that even simple tasks present substantial challenges for today's frontier AI models. We conclude with a discussion of the potential underlying causes, implications for future development, and research directions to improve these AI systems.


CogSimulator: A Model for Simulating User Cognition & Behavior with Minimal Data for Tailored Cognitive Enhancement

Bian, Weizhen, Zhou, Yubo, Luo, Yuanhang, Mo, Ming, Liu, Siyan, Gong, Yikai, Wan, Renjie, Luo, Ziyuan, Wang, Aobo

arXiv.org Artificial Intelligence

The interplay between cognition and gaming, notably through educational games enhancing cognitive skills, has garnered significant attention in recent years. This research introduces the CogSimulator, a novel algorithm for simulating user cognition in small-group settings with minimal data, as the educational game Wordle exemplifies. The CogSimulator employs Wasserstein-1 distance and coordinates search optimization for hyperparameter tuning, enabling precise few-shot predictions in new game scenarios. Comparative experiments with the Wordle dataset illustrate that our model surpasses most conventional machine learning models in mean Wasserstein-1 distance, mean squared error, and mean accuracy, showcasing its efficacy in cognitive enhancement through tailored game design.


Semantic, Orthographic, and Morphological Biases in Humans' Wordle Gameplay

Liang, Gary, Kabbara, Adam, Liu, Cindy, Luo, Ronaldo, Kim, Kina, Guerzhoy, Michael

arXiv.org Artificial Intelligence

We show that human players' gameplay in the game of Wordle is influenced by the semantics, orthography, and morphology of the player's previous guesses. We demonstrate this influence by comparing actual human players' guesses to near-optimal guesses, showing that human players' guesses are biased to be similar to previous guesses semantically, orthographically, and morphologically.


On the Modeling Capabilities of Large Language Models for Sequential Decision Making

Klissarov, Martin, Hjelm, Devon, Toshev, Alexander, Mazoure, Bogdan

arXiv.org Artificial Intelligence

Large pretrained models are showing increasingly better performance in reasoning and planning tasks across different modalities, opening the possibility to leverage them for complex sequential decision making problems. In this paper, we investigate the capabilities of Large Language Models (LLMs) for reinforcement learning (RL) across a diversity of interactive domains. We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly, by first generating reward models to train an agent with RL. Our results show that, even without task-specific fine-tuning, LLMs excel at reward modeling. In particular, crafting rewards through artificial intelligence (AI) feedback yields the most generally applicable approach and can enhance performance by improving credit assignment and exploration. Finally, in environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities while mitigating catastrophic forgetting, further broadening their utility in sequential decision-making tasks.


Strategic Insights in Human and Large Language Model Tactics at Word Guessing Games

Rikters, Matīss, Reinsone, Sanita

arXiv.org Artificial Intelligence

At the beginning of 2022, a simplistic word-guessing game took the world by storm and was further adapted to many languages beyond the original English version. In this paper, we examine the strategies of daily word-guessing game players that have evolved during a period of over two years. A survey gathered from 25% of frequent players reveals their strategies and motivations for continuing the daily journey. We also explore the capability of several popular open-access large language model systems and open-source models at comprehending and playing the game in two different languages. Results highlight the struggles of certain models to maintain correct guess length and generate repetitions, as well as hallucinations of non-existent words and inflections.


Wordle Was the Game the World Needed. How Do You Make the Next One?

Slate

The Puzzmo game designer speaks with Felix Salmon about how to make addicting, viral pastimes that turn a profit. They discuss what made Wordle such a breakout hit, how to make games for both bad and good players, and the strained relationship between art and profit. If you enjoy this show, please consider signing up for Slate Plus. Slate Plus members get an ad-free experience across the network and an additional segment of our regular show every week. You'll also be supporting the work we do here on Slate Money.


Selecting Seed Words for Wordle using Character Statistics

de Silva, Nisansa

arXiv.org Artificial Intelligence

Wordle, a word guessing game rose to global popularity in the January of 2022. The goal of the game is to guess a five-letter English word within six tries. Each try provides the player with hints by means of colour changing tiles which inform whether or not a given character is part of the solution as well as, in cases where it is part of the solution, whether or not it is in the correct placement. Numerous attempts have been made to find the best starting word and best strategy to solve the daily wordle. This study uses character statistics of five-letter words to determine the best three starting words.


Wordle: A Microcosm of Life. Luck, Skill, Cheating, Loyalty, and Influence!

Dilger, James P.

arXiv.org Artificial Intelligence

Wordle is a popular, online word game offered by the New York Times (nytimes.com). Currently there are some 2 million players of the English version worldwide. Players have 6 attempts to guess the daily word (target word) and after each attempt, the player receives color-coded information about the correctness and position of each letter in the guess. After either a successful completion of the puzzle or the final unsuccessful attempt, software can assess the player's luck and skill using Information Theory and can display data for the first, second, ..., sixth guesses of a random sample of all players. Recently, I discovered that the latter data is presented in a format that can easily be copied and pasted into a spreadsheet. I compiled data on Wordle players' first guesses from May 2023 - August 2023 and inferred some interesting information about Wordle players. A) Every day, about 0.2-0.5% of players solve the puzzle in one attempt. Because the odds of guessing the one of 2,315 possible target words at random is 0.043%, this implies that 4,000 - 10,000 players cheat by obtaining the target word outside of playing the game! B) At least 1/3 of the players have a favorite starting word, or cycle through several. And even though players should be aware that target words are never repeated, most players appear to remain loyal to their starting word even after its appearance as a target word. C) On August 15, 2023, about 30,000 players abruptly changed their starting word, presumably based on a crossword puzzle clue! Wordle players can be influenced! This study goes beyond social media postings, surveys, and Google Trends to provide solid, quantitative evidence about cheating in Wordle.


The Trendy New Trivia Game That's Like Wordle for Straight Men

Slate

We are in the midst of an unprecedented, intergenerational phone-game renaissance. Wordle has become a pillar of the New York Times brand, newspapers everywhere are resurrecting their crossword backpage, and Words With Friends has essentially transformed into a dating app. These games are designed to be approachably mainstream--every English speaker alive can deduce a five-letter word with six chances--but unfortunately, I am a man of unconventional taste. If I'm going to entertain a daily dose of potpourri, I need something weirder, more challenging, and better suited for the precise category of useless knowledge that occupies my brain. That's why the sports-trivia game Immaculate Grid has become a fixture of my morning routine.