Agents
Towards Human-AI-Robot Collaboration and AI-Agent based Digital Twins for Parkinson's Disease Management: Review and Outlook
Hizeh, Hassan, Chighri, Rim, Rahman, Muhammad Mahboob Ur, Bahloul, Mohamed A., Muqaibel, Ali, Al-Naffouri, Tareq Y.
The current body of research on Parkinson's disease (PD) screening, monitoring, and management has evolved along two largely independent trajectories. The first research community focuses on multimodal sensing of PD-related biomarkers using noninvasive technologies such as inertial measurement units (IMUs), force/pressure insoles, electromyography (EMG), electroencephalography (EEG), speech and acoustic analysis, and RGB/RGB-D motion capture systems. These studies emphasize data acquisition, feature extraction, and machine learning-based classification for PD screening, diagnosis, and disease progression modeling. In parallel, a second research community has concentrated on robotic intervention and rehabilitation, employing socially assistive robots (SARs), robot-assisted rehabilitation (RAR) systems, and virtual reality (VR)-integrated robotic platforms for improving motor and cognitive function, enhancing social engagement, and supporting caregivers. Despite the complementary goals of these two domains, their methodological and technological integration remains limited, with minimal data-level or decision-level coupling between the two. With the advent of advanced artificial intelligence (AI), including large language models (LLMs), agentic AI systems, a unique opportunity now exists to unify these research streams. We envision a closed-loop sensor-AI-robot framework in which multimodal sensing continuously guides the interaction between the patient, caregiver, humanoid robot (and physician) through AI agents that are powered by a multitude of AI models such as robotic and wearables foundation models, LLM-based reasoning, reinforcement learning, and continual learning. Such closed-loop system enables personalized, explainable, and context-aware intervention, forming the basis for digital twin of the PD patient that can adapt over time to deliver intelligent, patient-centered PD care.
Scaling Agent Learning via Experience Synthesis
Chen, Zhaorun, Zhao, Zhuokai, Zhang, Kai, Liu, Bo, Qi, Qi, Wu, Yifan, Kalluri, Tarun, Cao, Sara, Xiong, Yuanhao, Tong, Haibo, Yao, Huaxiu, Li, Hengduo, Zhu, Jiacheng, Li, Xian, Song, Dawn, Li, Bo, Weston, Jason, Huynh, Dat
While reinforcement learning (RL) can empower autonomous agents by enabling self-improvement through interaction, its practical adoption remains challenging due to costly rollouts, limited task diversity, unreliable reward signals, and infrastructure complexity, all of which obstruct the collection of scalable experience data. To address these challenges, we introduce DreamGym, the first unified framework designed to synthesize diverse experiences with scalability in mind to enable effective online RL training for autonomous agents. Rather than relying on expensive real-environment rollouts, DreamGym distills environment dynamics into a reasoning-based experience model that derives consistent state transitions and feedback signals through step-by-step reasoning, enabling scalable agent rollout collection for RL. To improve the stability and quality of transitions, DreamGym leverages an experience replay buffer initialized with offline real-world data and continuously enriched with fresh interactions to actively support agent training. To improve knowledge acquisition, DreamGym adaptively generates new tasks that challenge the current agent policy, enabling more effective online curriculum learning. Experiments across diverse environments and agent backbones demonstrate that DreamGym substantially improves RL training, both in fully synthetic settings and in sim-to-real transfer scenarios. On non-RL-ready tasks like WebArena, DreamGym outperforms all baselines by over 30%. And in RL-ready but costly settings, it matches GRPO and PPO performance using only synthetic interactions. When transferring a policy trained purely on synthetic experiences to real-environment RL, DreamGym yields significant additional performance gains while requiring far fewer real-world interactions, providing a scalable warm-start strategy for general-purpose RL.
Conversational Collective Intelligence (CCI) using Hyperchat AI in a Real-world Forecasting Task
Schumann, Hans, Rosenberg, Louis, Mani, Ganesh, Willcox, Gregg
Hyperchat AI is a novel agentic technology that enables thoughtful conversations among networked human groups of potentially unlimited size. It allows large teams to discuss complex issues, brainstorm ideas, surface risks, assess alternatives and efficiently converge on optimized solutions that amplify the group's Collective Intelligence (CI). A formal study was conducted to quantify the forecasting accuracy of human groups using Hyperchat AI to conversationally predict the outcome of Major League Baseball (MLB) games. During an 8-week period, networked groups of approximately 24 sports fans were tasked with collaboratively forecasting the winners of 59 baseball games through real-time conversation facilitated by AI agents. The results showed that when debating the games using Hyperchat AI technology, the groups converged on High Confidence predictions that significantly outperformed Vegas betting markets. Specifically, groups were 78% accurate in their High Confidence picks, a statistically strong result vs the Vegas odds of 57% (p=0.020). Had the groups bet against the spread (ATS) on these games, they would have achieved a 46% ROI against Vegas betting markets. In addition, High Confidence forecasts that were generated through above-average conversation rates were 88% accurate, suggesting that real-time interactive deliberation is central to amplified accuracy.
Pure Vision Language Action (VLA) Models: A Comprehensive Survey
Zhang, Dapeng, Sun, Jing, Hu, Chenghui, Wu, Xiaoyan, Yuan, Zhenlong, Zhou, Rui, Shen, Fei, Zhou, Qingguo
The emergence of Vision Language Action (VLA) models marks a paradigm shift from traditional policy-based control to generalized robotics, reframing Vision Language Models (VLMs) from passive sequence generators into active agents for manipulation and decision-making in complex, dynamic environments. This survey delves into advanced VLA methods, aiming to provide a clear taxonomy and a systematic, comprehensive review of existing research. It presents a comprehensive analysis of VLA applications across different scenarios and classifies VLA approaches into several paradigms: autoregression-based, diffusion-based, reinforcement-based, hybrid, and specialized methods; while examining their motivations, core strategies, and implementations in detail. In addition, foundational datasets, benchmarks, and simulation platforms are introduced. Building on the current VLA landscape, the review further proposes perspectives on key challenges and future directions to advance research in VLA models and generalizable robotics. By synthesizing insights from over three hundred recent studies, this survey maps the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose VLA methods.
ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System
Han, Dong, Ai, Zhehong, Cai, Pengxiang, Lu, Shanya, Chen, Jianpeng, Ye, Zihao, Sun, Shuzhou, Gao, Ben, Ge, Lingli, Wang, Weida, Zhou, Xiangxin, Liu, Xihui, Su, Mao, Ouyang, Wanli, Bai, Lei, Zhou, Dongzhan, Xu, Tao, Li, Yuqiang, Zhang, Shufei
Bayesian optimization (BO) is a powerful tool for scientific discovery in chemistry, yet its efficiency is often hampered by the sparse experimental data and vast search space. Here, we introduce ChemBOMAS: a large language model (LLM)-enhanced multi-agent system that accelerates BO through synergistic data- and knowledge-driven strategies. Firstly, the data-driven strategy involves an 8B-scale LLM regressor fine-tuned on a mere 1% labeled samples for pseudo-data generation, robustly initializing the optimization process. Secondly, the knowledge-driven strategy employs a hybrid Retrieval-Augmented Generation approach to guide LLM in dividing the search space while mitigating LLM hallucinations. An Upper Confidence Bound algorithm then identifies high-potential subspaces within this established partition. Across the LLM-refined subspaces and supported by LLM-generated data, BO achieves the improvement of effectiveness and efficiency. Comprehensive evaluations across multiple scientific benchmarks demonstrate that ChemBOMAS set a new state-of-the-art, accelerating optimization efficiency by up to 5-fold compared to baseline methods.
Online Learning and Coverage of Unknown Fields Using Random-Feature Gaussian Processes
Du, Ruijie, Lin, Ruoyu, Shen, Yanning, Egerstedt, Magnus
This paper proposes a framework for multi-robot systems to perform simultaneous learning and coverage of a domain of interest characterized by an unknown and potentially time-varying density function. To overcome the limitations of Gaussian Process (GP) regression, we employ Random Feature GP (RFGP) and its online variant (O-RFGP) which enables online and incremental inference. By integrating these with Voronoi-based coverage control and Upper Confidence Bound (UCB) sampling strategy, a team of robots can adaptively focus on important regions while refining the learned spatial field for efficient coverage. The incremental update mechanism of O-RFGP naturally supports time-varying environments, allowing efficient adaptation without retaining historical data. Furthermore, to the best of our knowledge, we provide the first theoretical analysis of online learning and coverage through a regret-based formulation, establishing asymptotic no-regret guarantees in the time-invariant setting. The effectiveness of the proposed framework is demonstrated through simulations with both time-invariant and time-varying density functions, along with a physical experiment with a time-varying density function.
ECom-Bench: Can LLM Agent Resolve Real-World E-commerce Customer Support Issues?
Wang, Haoxin, Peng, Xianhan, Huang, Xucheng, Huang, Yizhe, Gong, Ming, Yang, Chenghan, Liu, Yang, Jiang, Ling
In this paper, we introduce ECom-Bench, the first benchmark framework for evaluating LLM agent with multimodal capabilities in the e-commerce customer support domain. ECom-Bench features dynamic user simulation based on persona information collected from real e-commerce customer interactions and a realistic task dataset derived from authentic e-commerce dialogues. These tasks, covering a wide range of business scenarios, are designed to reflect real-world complexities, making ECom-Bench highly challenging. For instance, even advanced models like GPT-4o achieve only a 10-20% pass^3 metric in our benchmark, highlighting the substantial difficulties posed by complex e-commerce scenarios. The code and data have been made publicly available at https://github.com/XiaoduoAILab/ECom-Bench to facilitate further research and development in this domain.
IMPACT: Behavioral Intention-aware Multimodal Trajectory Prediction with Adaptive Context Trimming
Sun, Jiawei, Yue, Xibin, Li, Jiahui, Shen, Tianle, Yuan, Chengran, Sun, Shuo, Guo, Sheng, Zhou, Quanyun, Ang, Marcelo H Jr
While most prior research has focused on improving the precision of multimodal trajectory predictions, the explicit modeling of multimodal behavioral intentions (e.g., yielding, overtaking) remains relatively underexplored. This paper proposes a unified framework that jointly predicts both behavioral intentions and trajectories to enhance prediction accuracy, interpretability, and efficiency. Specifically, we employ a shared context encoder for both intention and trajectory predictions, thereby reducing structural redundancy and information loss. Moreover, we address the lack of ground-truth behavioral intention labels in mainstream datasets (Waymo, Argoverse) by auto-labeling these datasets, thus advancing the community's efforts in this direction. We further introduce a vectorized occupancy prediction module that infers the probability of each map polyline being occupied by the target vehicle's future trajectory. By leveraging these intention and occupancy prediction priors, our method conducts dynamic, modality-dependent pruning of irrelevant agents and map polylines in the decoding stage, effectively reducing computational overhead and mitigating noise from non-critical elements. Our approach ranks first among LiDAR-free methods on the Waymo Motion Dataset and achieves first place on the Waymo Interactive Prediction Dataset. Remarkably, even without model ensembling, our single-model framework improves the soft mean average precision (softmAP) by 10 percent compared to the second-best method in the Waymo Interactive Prediction Leaderboard. Furthermore, the proposed framework has been successfully deployed on real vehicles, demonstrating its practical effectiveness in real-world applications.
Maestro: Learning to Collaborate via Conditional Listwise Policy Optimization for Multi-Agent LLMs
Yang, Wei, Pang, Jiacheng, Li, Shixuan, Bogdan, Paul, Tu, Stephen, Thomason, Jesse
Multi-agent systems (MAS) built on Large Language Models (LLMs) are being used to approach complex problems and can surpass single model inference. However, their success hinges on navigating a fundamental cognitive tension: the need to balance broad, divergent exploration of the solution space with a principled, convergent synthesis to the optimal solution. Existing paradigms often struggle to manage this duality, leading to premature consensus, error propagation, and a critical credit assignment problem that fails to distinguish between genuine reasoning and superficially plausible arguments. To operationalize this critical synthesis phase, we introduce Conditional Listwise Policy Optimization (CLPO), a reinforcement learning objective that disentangles signals for strategic decisions and tactical rationales. By combining decision-focused policy gradients with a list-wise ranking loss over justifications, CLPO achieves clean credit assignment and stronger comparative supervision. The rise of large language models (LLMs) have enabled a new type of multi-agent system (MAS) (Park et al., 2023; Chen et al., 2023a; Zhu et al., 2025), where multiple model instances collaborate to tackle problems that exceed the capacity of any single model (Zhang et al., 2024a; Qiao et al., 2024; Han et al., 2025). By distributing roles and enabling structured interaction, MASs hold the promise of achieving robustness, creativity, and reliability that emerge from collective intelligence (Cheng et al., 2024; Pezeshkpour et al., 2024). At the heart of any effective collaborative system lies a fundamental cognitive tension. Early work in the psychology of creativity (Runco & Chand, 1995; Brophy, 2001; Zhang et al., 2020) emphasizes that intelligent problem-solving requires a dynamic balance between two seemingly contradictory modes of thought: Divergent Creativity and Convergent Critique. Guilford's theory of divergent and convergent thinking (Guilford, 1967) formalizes this duality: divergence is the generative process of exploring a wide array of alternative hypotheses, while convergence is the evaluative process of comparing, refining, and synthesizing these options.
An Epistemic Perspective on Agent Awareness
Naumov, Pavel, Pavlova, Alexandra
The paper proposes to treat agent awareness as a form of knowledge, breaking the tradition in the existing literature on awareness. It distinguishes the de re and de dicto forms of such knowledge. The work introduces two modalities capturing these forms and formally specifies their meaning using a version of 2D-semantics. The main technical result is a sound and complete logical system describing the interplay between the two proposed modalities and the standard "knowledge of the fact" modality.