Goto

Collaborating Authors

 human-agent interaction


How can we assess human-agent interactions? Case studies in software agent design

arXiv.org Artificial Intelligence

LLM-powered agents are both a promising new technology and a source of complexity, where choices about models, tools, and prompting can affect their usefulness. While numerous benchmarks measure agent accuracy across domains, they mostly assume full automation, failing to represent the collaborative nature of real-world use cases. In this paper, we make two major steps towards the rigorous assessment of human-agent interactions. First, we propose PULSE, a framework for more efficient human-centric evaluation of agent designs, which comprises collecting user feedback, training an ML model to predict user satisfaction, and computing results by combining human satisfaction ratings with model-generated pseudo-labels. Second, we deploy the framework on a large-scale web platform built around the open-source software agent OpenHands, collecting in-the-wild usage data across over 15k users. We conduct case studies around how three agent design decisions -- choice of LLM backbone, planning strategy, and memory mechanisms -- impact developer satisfaction rates, yielding practical insights for software agent design. We also show how our framework can lead to more robust conclusions about agent design, reducing confidence intervals by 40% compared to a standard A/B test. Finally, we find substantial discrepancies between in-the-wild results and benchmark performance (e.g., the anti-correlation between results comparing claude-sonnet-4 and gpt-5), underscoring the limitations of benchmark-driven evaluation. Our findings provide guidance for evaluations of LLM agents with humans and identify opportunities for better agent designs.


The Human-or-Machine Issue: Turing-Inspired Reflections on an Everyday Matter

Communications of the ACM

Alan Turing's 1950 paper37 introduced the famed "imitation game" as a means of determining whether a computer can be considered intelligent, thus informing the definition of machine intelligence. Over the years, the Turing test has been the subject of analysis and discussion, resulting in several variants, and has been reflected upon in retrospective reviews (see, for example, French10). Similar tests have been proposed in quite different areas, including automotive, games, urban and industrial planning, biological and biochemical modeling, and odor reproduction. The purposes of such variant tests range from offering practical techniques to discern an agent's identity to serving as a norm, or yardstick, for assessing the quality and fidelity of a model or reproduction process in mirroring the original's properties (see, for example, Harel11). Here, we completely sidestep the issue of defining or measuring intelligence, as well as the practical question of whether a machine can be built to replace, or mimic, a person in the performance of some specific task.33


Agents: An Open-source Framework for Autonomous Language Agents

arXiv.org Artificial Intelligence

Recent advances on large language models (LLMs) enable researchers and developers to build autonomous language agents that can automatically solve various tasks and interact with environments, humans, and other agents using natural language interfaces. We consider language agents as a promising direction towards artificial general intelligence and release Agents, an open-source library with the goal of opening up these advances to a wider non-specialist audience. Agents is carefully engineered to support important features including planning, memory, tool usage, multi-agent communication, and fine-grained symbolic control. Agents is user-friendly as it enables non-specialists to build, customize, test, tune, and deploy state-of-the-art autonomous language agents without much coding. The library is also research-friendly as its modularized design makes it easily extensible for researchers. Agents is available at https://github.com/aiwaves-cn/agents.


The Rise and Potential of Large Language Model Based Agents: A Survey

arXiv.org Artificial Intelligence

For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent agents, but they mainly focus on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse scenarios. Due to the versatile capabilities they demonstrate, large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI), offering hope for building general AI agents. Many researchers have leveraged LLMs as the foundation to build AI agents and have achieved significant progress. In this paper, we perform a comprehensive survey on LLM-based agents. We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents. Building upon this, we present a general framework for LLM-based agents, comprising three main components: brain, perception, and action, and the framework can be tailored for different applications. Subsequently, we explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation. Following this, we delve into agent societies, exploring the behavior and personality of LLM-based agents, the social phenomena that emerge from an agent society, and the insights they offer for human society. Finally, we discuss several key topics and open problems within the field. A repository for the related papers at https://github.com/WooooDyy/LLM-Agent-Paper-List.


Automatic Thoughts and Facial Expressions in Cognitive Restructuring with Virtual Agents

#artificialintelligence

Cognitive restructuring is a well-established mental health technique for amending automatic thoughts, which are distorted and biased beliefs about a situation, into objective and balanced thoughts. Since virtual agents can be used anytime and anywhere, they are expected to perform cognitive restructuring without being influenced by medical infrastructure or patients' stigma toward mental illness. Unfortunately, since the quantitative analysis of human-agent interaction is still insufficient, the effect on the user's cognitive state remains unclear. We collected interaction data between virtual agents and users to observe the mood improvements associated with changes in automatic thoughts that occur in user cognition and addressed the following two points: (1) implementation of a virtual agent that helps a user identify and evaluate automatic thoughts; (2) identification of the relationship between a user's facial expressions and the extent of the mood improvement subjectively felt by users during the human-agent interaction. We focus on these points because cognitive restructuring by a human therapist starts by identifying automatic thoughts and seeking sufficient evidence to find balanced thoughts (evaluation of automatic thoughts). Therapists also use such non-verbal behaviors as facial expressions to detect changes in a user's mood, which is an important indicator for guidance. Based on the results of this analysis, we provide a technical guidance framework that fully ...


The role of computer voice in the future of speech-based human-computer interaction

#artificialintelligence

As humans, we primarily communicate vocally and aurally. We convey not just linguistic information, but also the complexities of our emotional states and personalities. Aspects of our voice such as tone, rhythm, and pitch are vital to the way we are perceived. In other words, the way we say things matters. With advances in technology and the introduction of social robots, conversational agents, and voice assistants into our lives, we are expanding our interactions to include computer agents, interfaces, and environments.


Just Ask:An Interactive Learning Framework for Vision and Language Navigation

arXiv.org Artificial Intelligence

In the vision and language navigation task, the agent may encounter ambiguous situations that are hard to interpret by just relying on visual information and natural language instructions. We propose an interactive learning framework to endow the agent with the ability to ask for users' help in such situations. As part of this framework, we investigate multiple learning approaches for the agent with different levels of complexity. The simplest model-confusion-based method lets the agent ask questions based on its confusion, relying on the predefined confidence threshold of a next action prediction model. To build on this confusion-based method, the agent is expected to demonstrate more sophisticated reasoning such that it discovers the timing and locations to interact with a human. We achieve this goal using reinforcement learning (RL) with a proposed reward shaping term, which enables the agent to ask questions only when necessary. The success rate can be boosted by at least 15% with only one question asked on average during the navigation. Furthermore, we show that the RL agent is capable of adjusting dynamically to noisy human responses. Finally, we design a continual learning strategy, which can be viewed as a data augmentation method, for the agent to improve further utilizing its interaction history with a human. We demonstrate the proposed strategy is substantially more realistic and data-efficient compared to previously proposed pre-exploration techniques.


Petri Net Machines for Human-Agent Interaction

#artificialintelligence

Smart speakers and robots become ever more prevalent in our daily lives. These agents are able to execute a wide range of tasks and actions and, therefore, need systems to control their execution. Current state-of-the-art such as (deep) reinforcement learning, however, requires vast amounts of data for training which is often hard to come by when interacting with humans. To overcome this issue, most systems still rely on Finite State Machines. We introduce Petri Net Machines which present a formal definition for state machines based on Petri Nets that are able to execute concurrent actions reliably, execute and interleave several plans at the same time, and provide an easy to use modelling language.


Learning Actions and Action Verbs from Human-Agent Interaction

AAAI Conferences

Prior work done in learning by instruction (Huffman and Laird, 1995) Learning by interacting with humans is a powerful learning demonstrated learning systems that focus on agent-initiated paradigm. In a complex world learning through self-directed interaction, where instruction is directed by impasses arising experience alone can be slow, requiring repeated interactions in a Soar agent. They noted that instructor-initiated interaction with the environment. Learning from human-agent interaction is difficult to support because of the likely interruption can reduce the complexity of the learning task by reducing of agent's reasoning.


Designing for Human-Agent Interaction

AI Magazine

Most human-computer interfaces can be classified according to two dominant metaphors: (1) agent and (2) environment. In the environment metaphor, a model of the task domain is presented for the user to interact with directly. Norman's 1984 model of HCI is introduced as reference to organize and evaluate research in human-agent interaction (HAI). A wide variety of heterogeneous research involving HAI is shown to reflect automation of one of the stages of action or evaluation within Norman's model.