Goto

Collaborating Authors

 interactive fiction game


StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns

Wan, Luanbo, Ma, Weizhi

arXiv.org Artificial Intelligence

Long-term memory (LTM) is essential for large language models (LLMs) to achieve autonomous intelligence in complex, evolving environments. Despite increasing efforts in memory-augmented and retrieval-based architectures, there remains a lack of standardized benchmarks to systematically evaluate LLMs' long-term memory abilities. Existing benchmarks still face challenges in evaluating knowledge retention and dynamic sequential reasoning, and in their own flexibility, all of which limit their effectiveness in assessing models' LTM capabilities. To address these gaps, we propose a novel benchmark framework based on interactive fiction games, featuring dynamically branching storylines with complex reasoning structures. These structures simulate real-world scenarios by requiring LLMs to navigate hierarchical decision trees, where each choice triggers cascading dependencies across multi-turn interactions. Our benchmark emphasizes two distinct settings to test reasoning complexity: one with immediate feedback upon incorrect decisions, and the other requiring models to independently trace back and revise earlier choices after failure. As part of this benchmark, we also construct a new dataset designed to test LLMs' LTM within narrative-driven environments. We further validate the effectiveness of our approach through detailed experiments. Experimental results demonstrate the benchmark's ability to robustly and reliably assess LTM in LLMs.


PsychoGAT: A Novel Psychological Measurement Paradigm through Interactive Fiction Games with LLM Agents

Yang, Qisen, Wang, Zekun, Chen, Honghui, Wang, Shenzhi, Pu, Yifan, Gao, Xin, Huang, Wenhao, Song, Shiji, Huang, Gao

arXiv.org Artificial Intelligence

Psychological measurement is essential for mental health, self-understanding, and personal development. Traditional methods, such as self-report scales and psychologist interviews, often face challenges with engagement and accessibility. While game-based and LLM-based tools have been explored to improve user interest and automate assessment, they struggle to balance engagement with generalizability. In this work, we propose PsychoGAT (Psychological Game AgenTs) to achieve a generic gamification of psychological assessment. The main insight is that powerful LLMs can function both as adept psychologists and innovative game designers. By incorporating LLM agents into designated roles and carefully managing their interactions, PsychoGAT can transform any standardized scales into personalized and engaging interactive fiction games. To validate the proposed method, we conduct psychometric evaluations to assess its effectiveness and employ human evaluators to examine the generated content across various psychological constructs, including depression, cognitive distortions, and personality traits. Results demonstrate that PsychoGAT serves as an effective assessment tool, achieving statistically significant excellence in psychometric metrics such as reliability, convergent validity, and discriminant validity. Moreover, human evaluations confirm PsychoGAT's enhancements in content coherence, interactivity, interest, immersion, and satisfaction.


As spicy as you want it: interactive fiction games put forward a new kind of narrative

The Guardian

In late May, in a 58m Bel Air hilltop mansion, influencers, reality stars and other Angelenos milled around Netflix-branded TV screens displaying choices to be made: Are you a Gemini or a Capricorn? What color are your eyes? The party marked the launch of the streaming giant's latest offering: a slate of Choose Your Own Adventure-style mobile games inspired by its most popular reality television shows, and attendees were selecting the traits of their digital avatars. "I better be a character!" Selling Sunset star Jason Oppenheim exclaimed as he paused near the top of a staircase that led to a reflecting pool with the Netflix logo floating in it.


Chekhov's Gun Recognition

Tikhonov, Alexey, Yamshchikov, Ivan P.

arXiv.org Artificial Intelligence

Chekhov's gun is a dramatic principle stating that every element in a story must be necessary, and irrelevant elements should be removed. This paper presents a new natural language processing task -- Chekhov's gun recognition or (CGR) -- recognition of entities that are pivotal for the development of the plot. Though similar to classical Named Entity Recognition (NER) it has profound differences and is crucial for the tasks of narrative processing, since Chekhov's guns have a profound impact on the causal relationship in a story. The paper presents a new benchmark dataset for the CGR task that includes 5550 descriptions with one or more Chekhov's Gun in each and validates the task on two more datasets available in the natural language processing (NLP) literature. "One must never place a loaded rifle on the stage if it isn't going to go off. It's wrong to make promises you don't mean to keep."


Bringing Stories Alive: Generating Interactive Fiction Worlds

Ammanabrolu, Prithviraj, Cheung, Wesley, Tu, Dan, Broniec, William, Riedl, Mark O.

arXiv.org Artificial Intelligence

World building forms the foundation of any task that requires narrative intelligence. In this work, we focus on procedurally generating interactive fiction worlds--text-based worlds that players "see" and "talk to" using natural language. Generating these worlds requires referencing everyday and thematic commonsense priors in addition to being semantically consistent, interesting, and coherent throughout. Using existing story plots as inspiration, we present a method that first extracts a partial knowledge graph encoding basic information regarding world structure such as locations and objects. This knowledge graph is then automatically completed utilizing thematic knowledge and used to guide a neural language generation model that fleshes out the rest of the world.We perform human participant-based evaluations, testing our neural model's ability to extract and fill-in a knowledge graph and to generate language conditioned on it against rule-based and human-made baselines. Our code is available at https://github.com/


How ClickHole Crafts the Web's Most Hilarious Adventure Games

WIRED

I was dueling Anthony Bourdain to decide which one of us was more human. I had arrived at this moment via a surreal and silly journey that began with a question: "Can you pass the Turing Test?" I'd found this rabbit hole on ClickHole, the Buzzfeed-parodying offshoot of The Onion that has, however improbably, become a tiny haven for hilarious, often surprisingly complex Choose Your Own Adventure style interactive fiction games. Clickventures, as they're called, are exercises in absurdist escalation. They typically begin modestly, but quickly shift into the unexpected and ridiculous. To pass the Turing Test, I journeyed from a home computer office to an ersatz version of a Pokémon gym on the world stage.