Goto

Collaborating Authors

 Agents


Depth Matters: Multimodal RGB-D Perception for Robust Autonomous Agents

arXiv.org Artificial Intelligence

Autonomous agents that rely purely on perception to make real-time control decisions require efficient and robust architectures. In this work, we demonstrate that augmenting RGB input with depth information significantly enhances our agents' ability to predict steering commands compared to using RGB alone. We benchmark lightweight recurrent controllers that leverage the fused RGB-D features for sequential decision-making. To train our models, we collect high-quality data using a small-scale autonomous car controlled by an expert driver via a physical steering wheel, capturing varying levels of steering difficulty. Our models, trained under diverse configurations, were successfully deployed on real hardware. Specifically, our findings reveal that the early fusion of depth data results in a highly robust controller, which remains effective even with frame drops and increased noise levels, without compromising the network's focus on the task.


AI Agents in Cryptoland: Practical Attacks and No Silver Bullet

arXiv.org Artificial Intelligence

The integration of AI agents with Web3 ecosystems harnesses their complementary potential for autonomy and openness, yet also introduces underexplored security risks, as these agents dynamically interact with financial protocols and immutable smart contracts. This paper investigates the vulnerabilities of AI agents within blockchain-based financial ecosystems when exposed to adversarial threats in real-world scenarios. We introduce the concept of context manipulation -- a comprehensive attack vector that exploits unprotected context surfaces, including input channels, memory modules, and external data feeds. Through empirical analysis of ElizaOS, a decentralized AI agent framework for automated Web3 operations, we demonstrate how adversaries can manipulate context by injecting malicious instructions into prompts or historical interaction records, leading to unintended asset transfers and protocol violations which could be financially devastating. Our findings indicate that prompt-based defenses are insufficient, as malicious inputs can corrupt an agent's stored context, creating cascading vulnerabilities across interactions and platforms. This research highlights the urgent need to develop AI agents that are both secure and fiduciarily responsible.


Nonparametric Bellman Mappings for Value Iteration in Distributed Reinforcement Learning

arXiv.org Artificial Intelligence

This paper introduces novel Bellman mappings (B-Maps) for value iteration (VI) in distributed reinforcement learning (DRL), where multiple agents operate over a network without a centralized fusion node. Each agent constructs its own nonparametric B-Map for VI while communicating only with direct neighbors to achieve consensus. These B-Maps operate on Q-functions represented in a reproducing kernel Hilbert space, enabling a nonparametric formulation that allows for flexible, agent-specific basis function design. Unlike existing DRL methods that restrict information exchange to Q-function estimates, the proposed framework also enables agents to share basis information in the form of covariance matrices, capturing additional structural details. A theoretical analysis establishes linear convergence rates for both Q-function and covariance-matrix estimates toward their consensus values. The optimal learning rates for consensus-based updates are dictated by the ratio of the smallest positive eigenvalue to the largest one of the network's Laplacian matrix. Furthermore, each nodal Q-function estimate is shown to lie very close to the fixed point of a centralized nonparametric B-Map, effectively allowing the proposed DRL design to approximate the performance of a centralized fusion center. Numerical experiments on two well-known control problems demonstrate the superior performance of the proposed nonparametric B-Maps compared to prior methods. Notably, the results reveal a counter-intuitive finding: although the proposed approach involves greater information exchange -- specifically through the sharing of covariance matrices -- it achieves the desired performance with lower cumulative communication cost than existing DRL schemes, highlighting the crucial role of basis information in accelerating the learning process.


Shlomo Zilberstein wins the 2025 ACM/SIGAI Autonomous Agents Research Award

AIHub

This prestigious award is made for excellence in research in the area of autonomous agents. It is intended to recognize researchers in autonomous agents whose current work is an important influence on the field. Professor Shlomo Zilberstein was recognised for his work establishing the field of decentralized Markov Decision Processes (DEC-MDPs), laying the groundwork for decision-theoretic planning in multi-agent systems and multi-agent reinforcement learning (MARL). The selection committee noted that these contributions have become a cornerstone of multi-agent decision-making, influencing researchers and practitioners alike. Shlomo Zilberstein is Professor of Computer Science and former Associate Dean of Research at the University of Massachusetts Amherst. He is a Fellow of AAAI and the ACM, and has received numerous awards, including the UMass Chancellor's Medal, the IFAAMAS Influential Paper Award, and the AAAI Distinguished Service Award.


ECLAIR: Enhanced Clarification for Interactive Responses in an Enterprise AI Assistant

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown remarkable progress in understanding and generating natural language across various applications. However, they often struggle with resolving ambiguities in real-world, enterprise-level interactions, where context and domain-specific knowledge play a crucial role. In this demonstration, we introduce ECLAIR (Enhanced CLArification for Interactive Responses), a multi-agent framework for interactive disambiguation. ECLAIR enhances ambiguous user query clarification through an interactive process where custom agents are defined, ambiguity reasoning is conducted by the agents, clarification questions are generated, and user feedback is leveraged to refine the final response. When tested on real-world customer data, ECLAIR demonstrates significant improvements in clarification question generation compared to standard few-shot methods.


Data Spatial Programming

arXiv.org Artificial Intelligence

We introduce a novel programming model, Data Spatial Programming, which extends the semantics of Object-Oriented Programming (OOP) by introducing new class-like constructs called archetypes. These archetypes encapsulate spatial relationships between data entities and execution flow in a structured manner, enabling more expressive and semantically rich computations over interconnected data structures. By formalizing the relationships between data elements in space, our approach allows for more intuitive modeling of complex systems where the topology of connections is essential to the underlying computational model. This paradigm addresses limitations in traditional OOP when representing dynamically evolving networks, agent-based systems, and other spatially-oriented computational problems.


EmpathyAgent: Can Embodied Agents Conduct Empathetic Actions?

arXiv.org Artificial Intelligence

Empathy is fundamental to human interactions, yet it remains unclear whether embodied agents can provide human-like empathetic support. Existing works have studied agents' tasks solving and social interactions abilities, but whether agents can understand empathetic needs and conduct empathetic behaviors remains overlooked. To address this, we introduce EmpathyAgent, the first benchmark to evaluate and enhance agents' empathetic actions across diverse scenarios. EmpathyAgent contains 10,000 multimodal samples with corresponding empathetic task plans and three different challenges. To systematically evaluate the agents' empathetic actions, we propose an empathy-specific evaluation suite that evaluates the agents' empathy process. We benchmark current models and found that exhibiting empathetic actions remains a significant challenge. Meanwhile, we train Llama3-8B using EmpathyAgent and find it can potentially enhance empathetic behavior. By establishing a standard benchmark for evaluating empathetic actions, we hope to advance research in empathetic embodied agents. Our code and data are publicly available at https://github.com/xinyan-cxy/EmpathyAgent.


MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration

arXiv.org Artificial Intelligence

Multi-agent collaboration among models has shown promise in reasoning tasks but is underexplored in long-form generation tasks like summarization and question-answering. We extend multi-agent multi-model reasoning to generation, specifically to improving faithfulness through refinement, i.e., revising model-generated outputs to remove factual inconsistencies. We investigate how iterative collaboration among multiple instances and types of large language models (LLMs) enhances subtasks in the refinement process, such as error detection, critiquing unfaithful sentences, and making corrections based on critiques. We design intrinsic evaluations for each subtask, with our findings indicating that both multi-agent (multiple instances) and multi-model (diverse LLM types) approaches benefit error detection and critiquing. Additionally, reframing critiquing and refinement as reranking rather than generation tasks improves multi-agent performance. We consolidate these insights into a final "recipe" called Multi-Agent Multi-Model Refinement (MAMM-Refine), where multi-agent and multi-model collaboration significantly boosts performance on three summarization datasets as well as on long-form question answering, demonstrating the effectiveness and generalizability of our recipe.


Challenges and Trends in Egocentric Vision: A Survey

arXiv.org Artificial Intelligence

With the rapid development of artificial intelligence technologies and wearable devices, egocentric vision understanding has emerged as a new and challenging research direction, gradually attracting widespread attention from both academia and industry. Egocentric vision captures visual and multimodal data through cameras or sensors worn on the human body, offering a unique perspective that simulates human visual experiences. This paper provides a comprehensive survey of the research on egocentric vision understanding, systematically analyzing the components of egocentric scenes and categorizing the tasks into four main areas: subject understanding, object understanding, environment understanding, and hybrid understanding. We explore in detail the sub-tasks within each category. We also summarize the main challenges and trends currently existing in the field. Furthermore, this paper presents an overview of high-quality egocentric vision datasets, offering valuable resources for future research. By summarizing the latest advancements, we anticipate the broad applications of egocentric vision technologies in fields such as augmented reality, virtual reality, and embodied intelligence, and propose future research directions based on the latest developments in the field.


Multi-Agent Actor-Critic with Harmonic Annealing Pruning for Dynamic Spectrum Access Systems

arXiv.org Artificial Intelligence

Multi-Agent Deep Reinforcement Learning (MADRL) has emerged as a powerful tool for optimizing decentralized decision-making systems in complex settings, such as Dynamic Spectrum Access (DSA). However, deploying deep learning models on resource-constrained edge devices remains challenging due to their high computational cost. To address this challenge, in this paper, we present a novel sparse recurrent MARL framework integrating gradual neural network pruning into the independent actor global critic paradigm. Additionally, we introduce a harmonic annealing sparsity scheduler, which achieves comparable, and in certain cases superior, performance to standard linear and polynomial pruning schedulers at large sparsities. Our experimental investigation demonstrates that the proposed DSA framework can discover superior policies, under diverse training conditions, outperforming conventional DSA, MADRL baselines, and state-of-the-art pruning techniques.