Agents
A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
Cherep, Manuel, Ma, Chengtian, Xu, Abigail, Shaked, Maya, Maes, Pattie, Singh, Nikhil
Environments built for people are increasingly operated by a new class of economic actors: LLM-powered software agents making decisions on our behalf. These decisions range from our purchases to travel plans to medical treatment selection. Current evaluations of these agents largely focus on task competence, but we argue for a deeper assessment: how these agents choose when faced with realistic decisions. We introduce ABxLab, a framework for systematically probing agentic choice through controlled manipulations of option attributes and persuasive cues. We apply this to a realistic web-based shopping environment, where we vary prices, ratings, and psychological nudges, all of which are factors long known to shape human choice. We find that agent decisions shift predictably and substantially in response, revealing that agents are strongly biased choosers even without being subject to the cognitive constraints that shape human biases. This susceptibility reveals both risk and opportunity: risk, because agentic consumers may inherit and amplify human biases; opportunity, because consumer choice provides a powerful testbed for a behavioral science of AI agents, just as it has for the study of human behavior. We release our framework as an open benchmark for rigorous, scalable evaluation of agent decision-making.
A multi-agent control framework for co-adaptation in brain-computer interfaces
In a closed-loop brain-computer interface (BCI), adaptive decoders are used to learn parameters suited to decoding the user's neural response. Feedback to the user provides information which permits the neural tuning to also adapt. We present an approach to model this process of co-adaptation between the encoding model of the neural signal and the decoding algorithm as a multi-agent formulation of the linear quadratic Gaussian (LQG) control problem. In simulation we characterize how decoding performance improves as the neural encoding and adaptive decoder optimize, qualitatively resembling experimentally demonstrated closed-loop improvement. We then propose a novel, modified decoder update rule which is aware of the fact that the encoder is also changing and show it can improve simulated co-adaptation dynamics.
Diverse Randomized Agents Vote to Win
We investigate the power of voting among diverse, randomized software agents. With teams of computer Go agents in mind, we develop a novel theoretical model of two-stage noisy voting that builds on recent work in machine learning. This model allows us to reason about a collection of agents with different biases (determined by the first-stage noise models), which, furthermore, apply randomized algorithms to evaluate alternatives and produce votes (captured by the second-stage noise models). We analytically demonstrate that a uniform team, consisting of multiple instances of any single agent, must make a significant number of mistakes, whereas a diverse team converges to perfection as the number of agents grows. Our experiments, which pit teams of computer Go agents against strong agents, provide evidence for the effectiveness of voting when agents are diverse.
Fairness in Multi-Agent Sequential Decision-Making
We define a fairness solution criterion for multi-agent decision-making problems, where agents have local interests. This new criterion aims to maximize the worst performance of agents with consideration on the overall performance. We develop a simple linear programming approach and a more scalable game-theoretic approach for computing an optimal fairness policy. This game-theoretic approach formulates this fairness optimization as a two-player, zero-sum game and employs an iterative algorithm for finding a Nash equilibrium, corresponding to an optimal fairness policy.
MedMMV: A Controllable Multimodal Multi-Agent Framework for Reliable and Verifiable Clinical Reasoning
Liu, Hongjun, Zhu, Yinghao, Wang, Yuhui, Long, Yitao, Lai, Zeyu, Yu, Lequan, Zhao, Chen
Recent progress in multimodal large language models (MLLMs) has demonstrated promising performance on medical benchmarks and in preliminary trials as clinical assistants. Yet, our pilot audit of diagnostic cases uncovers a critical failure mode: instability in early evidence interpretation precedes hallucination, creating branching reasoning trajectories that cascade into globally inconsistent conclusions. This highlights the need for clinical reasoning agents that constrain stochasticity and hallucination while producing auditable decision flows. We introduce MedMMV, a controllable multimodal multi-agent framework for reliable and verifiable clinical reasoning. MedMMV stabilizes reasoning through diversified short rollouts, grounds intermediate steps in a structured evidence graph under the supervision of a Hallucination Detector, and aggregates candidate paths with a Combined Uncertainty scorer. On six medical benchmarks, MedMMV improves accuracy by up to 12.7% and, more critically, demonstrates superior reliability. Blind physician evaluations confirm that MedMMV substantially increases reasoning truthfulness without sacrificing informational content. By controlling instability through a verifiable, multi-agent process, our framework provides a robust path toward deploying trustworthy AI systems in high-stakes domains like clinical decision support.
Transduction is All You Need for Structured Data Workflows
Gliozzo, Alfio, Khan, Naweed, Constantinides, Christodoulos, Mihindukulasooriya, Nandana, Defosse, Nahuel, Rossiello, Gaetano, Lee, Junkyu
This paper introduces Agentics, a functional agentic AI framework for building LLM-based structured data workflow pipelines. Designed for both research and practical applications, Agentics offers a new data-centric paradigm in which agents are embedded within data types, enabling logical transduction between structured states. This design shifts the focus toward principled data modeling, providing a declarative language where data types are directly exposed to large language models and composed through transductions triggered by type connections. We present a range of structured data workflow tasks and empirical evidence demonstrating the effectiveness of this approach, including data wrangling, text-to-SQL semantic parsing, and domain-specific multiple-choice question answering. The open source Agentics is available at https://github.com/IBM/Agentics.
HeDA: An Intelligent Agent System for Heatwave Risk Discovery through Automated Knowledge Graph Construction and Multi-layer Risk Propagation Analysis
Wang, Yiquan, Huang, Tin-Yeh, Gao, Qingyun, Zhang, Jialin
Heatwaves pose complex cascading risks across interconnected climate, social, and economic systems, but knowledge fragmentation in scientific literature hinders comprehensive understanding of these risk pathways. We introduce HeDA (Heatwave Discovery Agent), an intelligent multi-agent system designed for automated scientific discovery through knowledge graph construction and multi-layer risk propagation analysis. HeDA processes over 10,247 academic papers to construct a comprehensive knowledge graph with 23,156 nodes and 89,472 relationships, employing novel multi-layer risk propagation analysis to systematically identify overlooked risk transmission pathways. Our system achieves 78.9% accuracy on complex question-answering tasks, outperforming state-of-the-art baselines including GPT-4 by 13.7%. Critically, HeDA successfully discovered five previously unidentified high-impact risk chains, such as the pathway where a heatwave leads to a water demand surge, resulting in industrial water restrictions and ultimately causing small business disruption, which were validated through historical case studies and domain expert review. This work presents a new paradigm for AI-driven scientific discovery, providing actionable insights for developing more resilient climate adaptation strategies.
Scaling Generalist Data-Analytic Agents
Qiao, Shuofei, Zhao, Yanqiu, Qiu, Zhisong, Wang, Xiaobin, Zhang, Jintian, Bin, Zhao, Zhang, Ningyu, Jiang, Yong, Xie, Pengjun, Huang, Fei, Chen, Huajun
Data-analytic agents are emerging as a key catalyst for automated scientific discovery and for the vision of Innovating AI. Current approaches, however, rely heavily on prompt engineering over proprietary models, while open-source models struggle to face diverse-format, large-scale data files and long-horizon, multi-step reasoning that real-world analytics demands. This paper introduces DataMind, a scalable data synthesis and agent training recipe designed to build generalist data-analytic agents. DataMind tackles three key challenges in building open-source data-analytic agents, including insufficient data resources, improper training strategy, and unstable code-based multi-turn rollout. Concretely, DataMind applies 1) a fine-grained task taxonomy and a recursive easy-to-hard task composition mechanism to increase the diversity and difficulty of synthesized queries; 2) a knowledge-augmented trajectory sampling strategy followed by model-based and rule-based filtering; 3) a dynamically adjustable training objective combining both SFT and RL losses; 4) a memory-frugal and stable code-based multi-turn rollout framework. Built on DataMind, we curate DataMind-12K, a high-quality trajectory set spanning diverse domains, task categories, and data file formats for data-analytic tasks. Trained on DataMind-12K, our DataMind-14B achieves state-of-the-art with an average score of 71.16% on multiple data analysis benchmarks, outperforming the strongest proprietary baselines DeepSeek-V3.1 and GPT-5. Our DataMind-7B also performs best among all open-source models with a score of 68.10%. We also incorporate some empirical insights gained from our exploratory trials into the analysis experiments, aiming to provide actionable insights about agentic training for the community. We will release DataMind-12K and DataMind-7B,14B for the community's future research.
Safety-Critical Input-Constrained Nonlinear Intercept Guidance in Multiple Engagement Zones
Ranjan, Praveen Kumar, Sinha, Abhinav, Cao, Yongcan
This paper presents an input-constrained nonlinear guidance law to address the problem of intercepting a stationary target in contested environments with multiple defending agents. Contrary to prior approaches that rely on explicit knowledge of defender strategies or utilize conservative safety conditions based on a defender's range, our work characterizes defender threats geometrically through engagement zones that delineate inevitable interception regions. Outside these engagement zones, the interceptor remains invulnerable. The proposed guidance law switches between a repulsive safety maneuver near these zones and a pursuit maneuver outside their influence. To deal with multiple engagement zones, we employ a smooth minimum function (log-sum-exponent approximation) that aggregates threats from all the zones while prioritizing the most critical threats. Input saturation is modeled and embedded in the non-holonomic vehicle dynamics so the controller respects actuator limits while maintaining stability. Numerical simulations with several defenders demonstrate the proposed method's ability to avoid engagement zones and achieve interception across diverse initial conditions.
When Autonomous Vehicle Meets V2X Cooperative Perception: How Far Are We?
Guo, An, Zhang, Shuoxiao, Tang, Enyi, Gao, Xinyu, Pang, Haomin, Tian, Haoxiang, Mu, Yanzhou, Wen, Wu, Fang, Chunrong, Chen, Zhenyu
With the tremendous advancement of deep learning and communication technology, Vehicle-to-Everything (V2X) cooperative perception has the potential to address limitations in sensing distant objects and occlusion for a single-agent perception system. V2X cooperative perception systems are software systems characterized by diverse sensor types and cooperative agents, varying fusion schemes, and operation under different communication conditions. Therefore, their complex composition gives rise to numerous operational challenges. Furthermore, when cooperative perception systems produce erroneous predictions, the types of errors and their underlying causes remain insufficiently explored. To bridge this gap, we take an initial step by conducting an empirical study of V2X cooperative perception. To systematically evaluate the impact of cooperative perception on the ego vehicle's perception performance, we identify and analyze six prevalent error patterns in cooperative perception systems. We further conduct a systematic evaluation of the critical components of these systems through our large-scale study and identify the following key findings: (1) The LiDAR-based cooperation configuration exhibits the highest perception performance; (2) Vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) communication exhibit distinct cooperative perception performance under different fusion schemes; (3) Increased cooperative perception errors may result in a higher frequency of driving violations; (4) Cooperative perception systems are not robust against communication interference when running online. Our results reveal potential risks and vulnerabilities in critical components of cooperative perception systems. We hope that our findings can better promote the design and repair of cooperative perception systems.