Goto

Collaborating Authors

 Agents


Agent-Arena: A General Framework for Evaluating Control Algorithms

arXiv.org Artificial Intelligence

Robotic research is inherently challenging, requiring expertise in diverse environments and control algorithms. Adapting algorithms to new environments often poses significant difficulties, compounded by the need for extensive hyper-parameter tuning in data-driven methods. To address these challenges, we present Agent-Arena, a Python framework designed to streamline the integration, replication, development, and testing of decision-making policies across a wide range of benchmark environments. Unlike existing frameworks, Agent-Arena is uniquely generalised to support all types of control algorithms and is adaptable to both simulation and real-robot scenarios. Please see our GitHub repository https://github.com/halid1020/agent-arena-v0.


Comparing Self-Disclosure Themes and Semantics to a Human, a Robot, and a Disembodied Agent

arXiv.org Artificial Intelligence

As social robots and other artificial agents become more conversationally capable, it is important to understand whether the content and meaning of self-disclosure towards these agents changes depending on the agent's embodiment. In this study, we analysed conversational data from three controlled experiments in which participants self-disclosed to a human, a humanoid social robot, and a disembodied conversational agent. Using sentence embeddings and clustering, we identified themes in participants' disclosures, which were then labelled and explained by a large language model. We subsequently assessed whether these themes and the underlying semantic structure of the disclosures varied by agent embodiment. Our findings reveal strong consistency: thematic distributions did not significantly differ across embodiments, and semantic similarity analyses showed that disclosures were expressed in highly comparable ways. These results suggest that while embodiment may influence human behaviour in human-robot and human-agent interactions, people tend to maintain a consistent thematic focus and semantic structure in their disclosures, whether speaking to humans or artificial interlocutors.


EXCLAIM: An Explainable Cross-Modal Agentic System for Misinformation Detection with Hierarchical Retrieval

arXiv.org Artificial Intelligence

Misinformation continues to pose a significant challenge in today's information ecosystem, profoundly shaping public perception and behavior. Among its various manifestations, Out-of-Context (OOC) misinformation is particularly obscure, as it distorts meaning by pairing authentic images with misleading textual narratives. Existing methods for detecting OOC misinformation predominantly rely on coarse-grained similarity metrics between image-text pairs, which often fail to capture subtle inconsistencies or provide meaningful explainability. While multi-modal large language models (MLLMs) demonstrate remarkable capabilities in visual reasoning and explanation generation, they have not yet demonstrated the capacity to address complex, fine-grained, and cross-modal distinctions necessary for robust OOC detection. To overcome these limitations, we introduce EXCLAIM, a retrieval-based framework designed to leverage external knowledge through multi-granularity index of multi-modal events and entities. Our approach integrates multi-granularity contextual analysis with a multi-agent reasoning architecture to systematically evaluate the consistency and integrity of multi-modal news content. Comprehensive experiments validate the effectiveness and resilience of EXCLAIM, demonstrating its ability to detect OOC misinformation with 4.3% higher accuracy compared to state-of-the-art approaches, while offering explainable and actionable insights.


Unraveling Human-AI Teaming: A Review and Outlook

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) is advancing at an unprecedented pace, with clear potential to enhance decision-making and productivity. Yet, the collaborative decision-making process between humans and AI remains underdeveloped, often falling short of its transformative possibilities. This paper explores the evolution of AI agents from passive tools to active collaborators in human-AI teams, emphasizing their ability to learn, adapt, and operate autonomously in complex environments. This paradigm shifts challenges traditional team dynamics, requiring new interaction protocols, delegation strategies, and responsibility distribution frameworks. Drawing on Team Situation Awareness (SA) theory, we identify two critical gaps in current human-AI teaming research: the difficulty of aligning AI agents with human values and objectives, and the underutilization of AI's capabilities as genuine team members. Addressing these gaps, we propose a structured research outlook centered on four key aspects of human-AI teaming: formulation, coordination, maintenance, and training. Our framework highlights the importance of shared mental models, trust-building, conflict resolution, and skill adaptation for effective teaming. Furthermore, we discuss the unique challenges posed by varying team compositions, goals, and complexities. This paper provides a foundational agenda for future research and practical design of sustainable, high-performing human-AI teams.


The AI Agent Era Requires a New Kind of Game Theory

WIRED

Zico Kolter has a knack for getting artificial intelligence to misbehave in interesting and important ways. His research group at Carnegie Mellon University has discovered numerous methods of tricking, goading, and confusing advanced AI models into being their worst selves. Kolter is a professor at CMU, a technical adviser to Gray Swan, a startup specializing in AI security, and, as of August 2024, a board member at the world's most prominent AI company, OpenAI. In addition to pioneering ways of jailbreaking commercial AI models, Kolter designs his own models that are more secure by nature. As AI becomes more autonomous, Kolter believes that AI agents may pose unique challenges--especially when they start talking to one another.


SkillFlow: Efficient Skill and Code Transfer Through Communication in Adapting AI Agents

arXiv.org Artificial Intelligence

AI agents are autonomous systems that can execute specific tasks based on predefined programming. Here, we present SkillFlow, a modular, technology-agnostic framework that allows agents to expand their functionality in an ad-hoc fashion by acquiring new skills from their environment or other agents. We present a theoretical model that examines under which conditions this framework would be beneficial, and we then explore SkillFlow's ability to accelerate task completion and lead to lower cumulative costs in a real-world application, namely scheduling agents for calendar events. We demonstrate that within a few iterations, SkillFlow leads to considerable (24.8%, p-value = $6.4\times10^{-3}$) gains in time and cost, especially when the communication cost is high. Finally, we draw analogies from well-studied biological systems and compare this framework to that of lateral gene transfer, a significant process of adaptation and evolution in novel environments.


Decentralizing AI Memory: SHIMI, a Semantic Hierarchical Memory Index for Scalable Agent Reasoning

arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) and vector-based search have become foundational tools for memory in AI systems, yet they struggle with abstraction, scalability, and semantic precision - especially in decentralized environments. We present SHIMI (Semantic Hierarchical Memory Index), a unified architecture that models knowledge as a dynamically structured hierarchy of concepts, enabling agents to retrieve information based on meaning rather than surface similarity. SHIMI organizes memory into layered semantic nodes and supports top-down traversal from abstract intent to specific entities, offering more precise and explainable retrieval. Critically, SHIMI is natively designed for decentralized ecosystems, where agents maintain local memory trees and synchronize them asynchronously across networks. We introduce a lightweight sync protocol that leverages Merkle-DAG summaries, Bloom filters, and CRDT-style conflict resolution to enable partial synchronization with minimal overhead. Through benchmark experiments and use cases involving decentralized agent collaboration, we demonstrate SHIMI's advantages in retrieval accuracy, semantic fidelity, and scalability - positioning it as a core infrastructure layer for decentralized cognitive systems.


Safe Interaction via Monte Carlo Linear-Quadratic Games

arXiv.org Artificial Intelligence

Safety is critical during human-robot interaction. But -- because people are inherently unpredictable -- it is often difficult for robots to plan safe behaviors. Instead of relying on our ability to anticipate humans, here we identify robot policies that are robust to unexpected human decisions. We achieve this by formulating human-robot interaction as a zero-sum game, where (in the worst case) the human's actions directly conflict with the robot's objective. Solving for the Nash Equilibrium of this game provides robot policies that maximize safety and performance across a wide range of human actions. Existing approaches attempt to find these optimal policies by leveraging Hamilton-Jacobi analysis (which is intractable) or linear-quadratic approximations (which are inexact). By contrast, in this work we propose a computationally efficient and theoretically justified method that converges towards the Nash Equilibrium policy. Our approach (which we call MCLQ) leverages linear-quadratic games to obtain an initial guess at safe robot behavior, and then iteratively refines that guess with a Monte Carlo search. Not only does MCLQ provide real-time safety adjustments, but it also enables the designer to tune how conservative the robot is -- preventing the system from focusing on unrealistic human behaviors. Our simulations and user study suggest that this approach advances safety in terms of both computation time and expected performance. See videos of our experiments here: https://youtu.be/KJuHeiWVuWY.


To Give or Not to Give? The Impacts of Strategically Withheld Recourse

arXiv.org Artificial Intelligence

To Give or Not to Give? The Impacts of Strategically Withheld Recourse Yatong Chen Andrew Estornell MPI for Intelligent Systems, T ubingen AI Center, T ubingen, Germany Bytedance Research Yevgeniy Vorobeychik Yang Liu Washington University in Saint Louis University of California, Santa Cruz Abstract Individuals often aim to reverse undesired outcomes in interactions with automated systems, like loan denials, by either implementing system-recommended actions (recourse), or manipulating their features. While providing recourse benefits users and enhances system utility, it also provides information about the decision process that can be used for more effective strategic manipulation, especially when the individuals collectively share such information with each other. We show that this tension leads rational utility-maximizing systems to frequently withhold recourse, resulting in decreased population utility, particularly impacting sensitive groups. To mitigate these effects, we explore ...


Rolling Horizon Coverage Control with Collaborative Autonomous Agents

arXiv.org Artificial Intelligence

A.2024.0146 1 Rolling Horizon Coverage Control with Collaborative Autonomous Agents Savvas Papaioannou, Panayiotis Kolios, Theocharis Theocharides, Christos G. Panayiotou and Marios M. Polycarpou Abstract This work proposes a coverage controller that enables an aerial team of distributed autonomous agents to collaboratively generate non-myopic coverage plans over a rolling finite horizon, aiming to cover specific points on the surface area of a 3D object of interest. The collaborative coverage problem, formulated, as a distributed model predictive control problem, optimizes the agents' motion and camera control inputs, while considering inter-agent constraints aiming at reducing work redundancy. The proposed coverage controller integrates constraints based on light-path propagation techniques to predict the parts of the object's surface that are visible with regard to the agents' future anticipated states. This work also demonstrates how complex, non-linear visibility assessment constraints can be converted into logical expressions that are embedded as binary constraints into a mixed-integer optimization framework. The proposed approach has been demonstrated through simulations and practical applications for inspecting buildings with unmanned aerial vehicles (UA Vs). I NTRODUCTION The interest in swarm systems such as systems utilizing multiple autonomous unmanned aerial vehicles (UA Vs) has skyrocketed over the last few decades. Rapid advancements in robotics, automation and artificial intelligence coupled with the decreasing costs of electronic components have fuelled a remarkable surge in interest towards the technologies and applications of swarming systems. This work addresses the challenge of coverage planning and control using multiple collaborative intelligent autonomous agents, specifically autonomous UA Vs. Coverage planning [1] is crucial in several application domains including search and rescue operations and critical infrastructure inspections. It is one of the essential functionalities that can notably enhance the autonomy of existing swarming systems enabling them to execute fully automated missions in the aforementioned scenarios. In coverage planning our objective is to design trajectories that allow a team of autonomous mobile agents to comprehensively cover a designated area or points of interest. Concurrently we aim to optimize a specific mission goal such as minimizing the mission's duration and energy consumption of the agents. This work introduces a coverage control framework that optimizes both the kinematic and camera control inputs of multiple UA V agents simultaneously.