believability
Symbolically Scaffolded Play: Designing Role-Sensitive Prompts for Generative NPC Dialogue
Figueiredo, Vanessa, Elumeze, David
Large Language Models (LLMs) promise to transform interactive games by enabling non-player characters (NPCs) to sustain unscripted dialogue. Yet it remains unclear whether constrained prompts actually improve player experience. We investigate this question through The Interview, a voice-based detective game powered by GPT-4o. A within-subjects usability study ($N=10$) compared high-constraint (HCP) and low-constraint (LCP) prompts, revealing no reliable experiential differences beyond sensitivity to technical breakdowns. Guided by these findings, we redesigned the HCP into a hybrid JSON+RAG scaffold and conducted a synthetic evaluation with an LLM judge, positioned as an early-stage complement to usability testing. Results uncovered a novel pattern: scaffolding effects were role-dependent: the Interviewer (quest-giver NPC) gained stability, while suspect NPCs lost improvisational believability. These findings overturn the assumption that tighter constraints inherently enhance play. Extending fuzzy-symbolic scaffolding, we introduce \textit{Symbolically Scaffolded Play}, a framework in which symbolic structures are expressed as fuzzy, numerical boundaries that stabilize coherence where needed while preserving improvisation where surprise sustains engagement.
- North America > Canada > Saskatchewan > Regina (0.50)
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Education (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
React to This (RTT): A Nonverbal Turing Test for Embodied AI
Zhang, Chuxuan, Etesam, Yasaman, Lim, Angelica
We propose an approach to test embodied AI agents for interaction awareness and believability, particularly in scenarios where humans push them to their limits. Turing introduced the Imitation Game as a way to explore the question: "Can machines think?" The Total Turing Test later expanded this concept beyond purely verbal communication, incorporating perceptual and physical interaction. Building on this, we propose a new guiding question: "Can machines react?" and introduce the React to This (RTT) test for nonverbal behaviors, presenting results from an initial experiment. In 1950, Turing [1] proposed the "imitation game" as a way to address the question: "Can machines think?" Since then, numerous attempts have been made to pass this test [2]. One of the earliest systems to highlight how surface-level language mimicry could deceive users was ELIZA [3], developed in 1965.
- North America > United States > California > Santa Clara County > Stanford (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.04)
- Europe > Netherlands > Gelderland > Nijmegen (0.04)
- Europe > France (0.04)
An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments
Korkiakoski, Mikko, Sheikhi, Saeid, Nyman, Jesper, Saariniemi, Jussi, Tapio, Kalle, Kostakos, Panos
Advancements in artificial intelligence (AI) have significantly enhanced the realism and interactivity of non-player characters (NPCs) in virtual reality (VR), creating more engaging and believable user experiences. This paper evaluates AI-driven NPCs within a VR interrogation simulator, focusing on their perceived realism, usability, and system performance. The simulator features two AI-powered NPCs, a suspect, and a partner, using GPT-4 Turbo to engage participants in a scenario to determine the suspect's guilt or innocence. A user study with 18 participants assessed the system using the System Usability Scale (SUS), Game Experience Questionnaire (GEQ), and a Virtual Agent Believability Questionnaire, alongside latency measurements for speech-to-text (STT), text-to-speech (TTS), OpenAI GPT-4 Turbo, and overall (cycle) latency. Results showed an average cycle latency of 7 seconds, influenced by the increasing conversational context. Believability scored 6.67 out of 10, with high ratings in behavior, social relationships, and intelligence but moderate scores in emotion and personality. The system achieved a SUS score of 79.44, indicating good usability. These findings demonstrate the potential of large language models to improve NPC realism and interaction in VR while highlighting challenges in reducing system latency and enhancing emotional depth. This research contributes to the development of more sophisticated AI-driven NPCs, revealing the need for performance optimization to achieve increasingly immersive virtual experiences.
- Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
- North America > United States (0.04)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
- Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)
Mitigation of gender bias in automatic facial non-verbal behaviors generation
Delbosc, Alice, Ochs, Magalie, Sabouret, Nicolas, Ravenet, Brian, Ayache, Stephane
Research on non-verbal behavior generation for social interactive agents focuses mainly on the believability and synchronization of non-verbal cues with speech. However, existing models, predominantly based on deep learning architectures, often perpetuate biases inherent in the training data. This raises ethical concerns, depending on the intended application of these agents. This paper addresses these issues by first examining the influence of gender on facial non-verbal behaviors. We concentrate on gaze, head movements, and facial expressions. We introduce a classifier capable of discerning the gender of a speaker from their non-verbal cues. This classifier achieves high accuracy on both real behavior data, extracted using state-of-the-art tools, and synthetic data, generated from a model developed in previous work.Building upon this work, we present a new model, FairGenderGen, which integrates a gender discriminator and a gradient reversal layer into our previous behavior generation model. This new model generates facial non-verbal behaviors from speech features, mitigating gender sensitivity in the generated behaviors. Our experiments demonstrate that the classifier, developed in the initial phase, is no longer effective in distinguishing the gender of the speaker from the generated non-verbal behaviors.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > Costa Rica > San José Province > San José (0.05)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
- (3 more...)
How Far Are We from Believable AI Agents? A Framework for Evaluating the Believability of Human Behavior Simulation
Xiao, Yang, Cheng, Yi, Fu, Jinlan, Wang, Jiashuo, Li, Wenjie, Liu, Pengfei
Human behavior simulation of AI agents necessitates the agents to possess a quality of believability, which is crucial as it facilitates users in establishing trust toward the agents and streamlines the fulfillment of the agents' goal. While recent advancements in Large Language Model (LLM) based agents have improved human behavior simulation, challenges inherent to LLMs (e.g., long context modeling) can undermine their believability. Consequently, evaluating AI agent believability becomes imperative. Unfortunately, prior research often neglects the negative impacts of LLM deficiencies. To address these gaps, we introduce two metrics for assessing LLM-based agent believability: consistency, and robustness, together with a benchmark, SimulateBench, with which, we evaluate the consistency and robustness of agents implemented with popular LLMs. We find that agents (i) struggle to accurately depict character information when presented with lengthy profile inputs; (ii) exhibit vulnerability to profile perturbations; and (iii) are significantly affected by certain key factors that impact their overall believability. Code and SimulateBench are public at https://github.com/GAIR-NLP/GPTMan.
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Hong Kong (0.04)
- (5 more...)
- Information Technology > Security & Privacy (0.67)
- Leisure & Entertainment > Games > Computer Games (0.46)
Grimm in Wonderland: Prompt Engineering with Midjourney to Illustrate Fairytales
The quality of text-to-image generation is continuously improving, yet the boundaries of its applicability are still unclear. In particular, refinement of the text input with the objective of achieving better results - commonly called prompt engineering - so far seems to have not been geared towards work with pre-existing texts. We investigate whether text-to-image generation and prompt engineering could be used to generate basic illustrations of popular fairytales. Using Midjourney v4, we engage in action research with a dual aim: to attempt to generate 5 believable illustrations for each of 5 popular fairytales, and to define a prompt engineering process that starts from a pre-existing text and arrives at an illustration of it. We arrive at a tentative 4-stage process: i) initial prompt, ii) composition adjustment, iii) style refinement, and iv) variation selection. We also discuss three reasons why the generation model struggles with certain illustrations: difficulties with counts, bias from stereotypical configurations and inability to depict overly fantastic situations. Our findings are not limited to the specific generation model and are intended to be generalisable to future ones.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
Science, Business, and Hype: Machine Learning as a typical case
In this article, I analyze the relation between science and business. I use machine learning as a case that is representative of the main features of such a relation. The analysis is easily extendable to other fields of science. How can you make money? This includes, for example, programming some software, cleaning bathrooms, stealing, etc.
- Information Technology (0.31)
- Automobiles & Trucks (0.31)
New And Surprising Ways to Be Mean. Adversarial NPCs with Coupled Empowerment Minimisation
Guckelsberger, Christian, Salge, Christoph, Togelius, Julian
Abstract-- Creating Non-Player Characters (NPCs) that can react robustly to unforeseen player behaviour or novel game content is difficult and time-consuming. This hinders the design of believable characters, and the inclusion of NPCs in games that rely heavily on procedural content generation. We have previously addressed this challenge by means of empowerment, a model of intrinsic motivation, and demonstrated how a coupled empowerment maximisation (CEM) policy can yield generic, companion-like behaviour . In this paper, we extend the CEM framework with a minimisation policy to give rise to adversarial behaviour . We conduct a qualitative, exploratory study in a dungeon-crawler game, demonstrating that CEM can exploit the affordances of different content facets in adaptive adversarial behaviour without modifications to the policy. Changes to the level design, underlying mechanics and our character's actions do not threaten our NPC's robustness, but yield new and surprising ways to be mean. Non-Player Characters (NPCs) in video games serve many purposes: they can be quest givers, conversation partners, leaders, sidekicks or other kinds of collaborators [1]. But in many cases they are adversaries . Adversarial NPCs also come in many forms, their behaviour varying according to the game genre, the design affordances, and the underlying algorithms. Treanor et al. [2] make the fundamental distinction between AI as Adversary and AI as Villain. Adversaries are designed to defeat the player without resorting to cheating, e.g. an AI for Chess or Go. The objective of an NPC villain in contrast is not to defeat the player but to create an interesting challenge which can be overcome eventually. We refer to both types simply as adversaries.
- North America > United States (0.04)
- Europe > United Kingdom > England > Hertfordshire (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
Believable Robot Characters
Believability of characters has been an objective in literature, theater, film, and animation. We argue that believable robot characters are important in human-robot interaction, as well. In particular, we contend that believable characters evoke users' social responses that, for some tasks, lead to more natural interactions and are associated with improved task performance. In a dialogue-capable robot, a key to such believability is the integration of a consistent story line, verbal and nonverbal behaviors, and sociocultural context. We describe our work in this area and present empirical results from three robot receptionist test beds that operate "in the wild."
Believable Character Reasoning and a Measure of Self-Confidence for Autonomous Team Actors
Samsonovich, Alexei V. (George Mason University)
This work presents a general-purpose character reasoning model intended for usage by autonomous team actors that are acting as believable characters (e.g., human team actors fall into this category). The idea is that selecting a cast of believable characters can predetermine a solution to an unexpected challenge that the team may be facing in a rescue mission. This approach in certain cases proves more efficient than an alternative approach based on rational decision making and planning, which ignores the question of character believability. This point is illustrated with a simple numerical example in a virtual world paradigm.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Ohio (0.04)
- North America > United States > North Carolina > Wake County > Raleigh (0.04)
- (7 more...)