Agents
GraspMAS: Zero-Shot Language-driven Grasp Detection with Multi-Agent System
Nguyen, Quang, Le, Tri, Nguyen, Huy, Vo, Thieu, Ta, Tung D., Huang, Baoru, Vu, Minh N., Nguyen, Anh
Language-driven grasp detection has the potential to revolutionize human-robot interaction by allowing robots to understand and execute grasping tasks based on natural language commands. However, existing approaches face two key challenges. First, they often struggle to interpret complex text instructions or operate ineffectively in densely cluttered environments. Second, most methods require a training or finetuning step to adapt to new domains, limiting their generation in real-world applications. In this paper, we introduce GraspMAS, a new multi-agent system framework for language-driven grasp detection. GraspMAS is designed to reason through ambiguities and improve decision-making in real-world scenarios. Our framework consists of three specialized agents: Planner, responsible for strategizing complex queries; Coder, which generates and executes source code; and Observer, which evaluates the outcomes and provides feedback. Intensive experiments on two large-scale datasets demonstrate that our GraspMAS significantly outperforms existing baselines. Additionally, robot experiments conducted in both simulation and real-world settings further validate the effectiveness of our approach. Our project page is available at https://zquang2202.github.io/GraspMAS
SciSage: A Multi-Agent Framework for High-Quality Scientific Survey Generation
Shi, Xiaofeng, Kou, Qian, Li, Yuduo, Tang, Ning, Xie, Jinxin, Yu, Longbin, Wang, Songjing, Zhou, Hua
The rapid growth of scientific literature demands robust tools for automated survey-generation. However, current large language model (LLM)-based methods often lack in-depth analysis, structural coherence, and reliable citations. To address these limitations, we introduce SciSage, a multi-agent framework employing a reflect-when-you-write paradigm. SciSage features a hierarchical Reflector agent that critically evaluates drafts at outline, section, and document levels, collaborating with specialized agents for query interpretation, content retrieval, and refinement. We also release SurveyScope, a rigorously curated benchmark of 46 high-impact papers (2020-2025) across 11 computer science domains, with strict recency and citation-based quality controls. Evaluations demonstrate that SciSage outperforms state-of-the-art baselines (LLM x MapReduce-V2, AutoSurvey), achieving +1.73 points in document coherence and +32% in citation F1 scores. Human evaluations reveal mixed outcomes (3 wins vs. 7 losses against human-written surveys), but highlight SciSage's strengths in topical breadth and retrieval efficiency. Overall, SciSage offers a promising foundation for research-assistive writing tools.
AutoGen Driven Multi Agent Framework for Iterative Crime Data Analysis and Prediction
Fatima, Syeda Kisaa, Zubair, Tehreem, Ahmed, Noman, Khan, Asifullah
Figure 4: P lot over 100 epochs with 3 - Agents F. Ablation Study - Impact of the LearningOptimizerAgent To quantify the OptimizerAgent's effect on the system, we conducted an ablation study that set up two different configurations. Baseline (3 - Agent Framework): CrimeAnalysisAssistant, FeedbackAgent, and CrimePredictorAgent. Extended (4 - Agent Framework): All of the above, with the OptimizerAgent that could oversee and control how the other agents worked. Both settings were tested using the same protocol, working with the same data for 100 epochs and evaluated according to the already mentioned metrics described in Section V - B. Importantly, during the extended framework tests the OptimizerAgent did not have access to the ground truth and its actions reflected those of a real - world supervisor trying to be efficient with resources . The main aim was to bring more stability and better learning curve using our framework LUCID - MA. Table 2: 4 - Aegnts Observed Improvement Metric Baseline (3 agents) With OptimizerAgent Improvement CrimeAnalysis Assistant Final Score 0.94 0.96 +0.02 FeedbackAgent Final Score 0.89 0.92 +0.03 CrimePredictorAgent Final Score 0.85 0.91 +0.06 Avg. Redundancy Across Epochs 14.2% 6.8% - 7.4% Using the OptimizerAgent resulted in a marked increase in the variety and quality of final system outputs . Visual Result: The final plot demonstrates that agent - level meta - control, As a result, the model exhibits higher consistency, greater variety in its results and more reliable improvement over time -- all accomplished without any need for further model fine - tuning. Figure 5: P lot over 100 epochs with 4 - Agents In addition to standard performance comparison metrics, our system portrayed advanced behavioral dynamics pointing to the pre sence of emergent intelligence capabilities which we delve into in the next section in great detail.
Specification and Evaluation of Multi-Agent LLM Systems -- Prototype and Cybersecurity Applications
Recent advancements in LLMs indicate potential for novel applications, as evidenced by the reasoning capabilities in the latest OpenAI and DeepSeek models. To apply these models to domain-specific applications beyond text generation, LLM-based multi-agent systems can be utilized to solve complex tasks, particularly by combining reasoning techniques, code generation, and software execution across multiple, potentially specialized LLMs. However, while many evaluations are performed on LLMs, reasoning techniques, and applications individually, their joint specification and combined application are not well understood. Defined specifications for multi-agent LLM systems are required to explore their potential and suitability for specific applications, allowing for systematic evaluations of LLMs, reasoning techniques, and related aspects. This paper reports the results of exploratory research on (1.) multi-agent specification by introducing an agent schema language and (2.) the execution and evaluation of the specifications through a multi-agent system architecture and prototype. The specification language, system architecture, and prototype are first presented in this work, building on an LLM system from prior research. Test cases involving cybersecurity tasks indicate the feasibility of the architecture and evaluation approach. As a result, evaluations could be demonstrated for question answering, server security, and network security tasks completed correctly by agents with LLMs from OpenAI and DeepSeek.
From Mind to Machine: The Rise of Manus AI as a Fully Autonomous Digital Agent
Shen, Minjie, Li, Yanshu, Chen, Lulu, Yang, Qikai
Manus AI is a general-purpose AI agent introduced in early 2025, marking a significant advancement in autonomous artificial intelligence. Developed by the Chinese startup Monica.im, Manus is designed to bridge the gap between "mind" and "hand" - combining the reasoning and planning capabilities of large language models with the ability to execute complex, end-to-end tasks that produce tangible outcomes. This paper presents a comprehensive overview of Manus AI, exploring its core technical architecture, diverse applications across sectors such as healthcare, finance, manufacturing, robotics, and gaming, as well as its key strengths, current limitations, and future potential. Positioned as a preview of what lies ahead, Manus AI represents a shift toward intelligent agents that can translate high-level intentions into real-world actions, heralding a new era of human-AI collaboration.
A Vision for Auto Research with LLM Agents
Liu, Chengwei, Wang, Chong, Cao, Jiayue, Ge, Jingquan, Wang, Kun, Zhang, Lyuye, Cheng, Ming-Ming, Zhao, Penghai, Li, Tianlin, Jia, Xiaojun, Li, Xiang, Li, Xingshuai, Liu, Yang, Feng, Yebo, Huang, Yihao, Xu, Yijia, Sun, Yuqiang, Zhou, Zhenhong, Xu, Zhengzi
This paper introduces Agent-Based Auto Research, a structured multi-agent framework designed to automate, coordinate, and optimize the full lifecycle of scientific research. Leveraging the capabilities of large language models (LLMs) and modular agent collaboration, the system spans all major research phases, including literature review, ideation, methodology planning, experimentation, paper writing, peer review response, and dissemination. By addressing issues such as fragmented workflows, uneven methodological expertise, and cognitive overload, the framework offers a systematic and scalable approach to scientific inquiry. Preliminary explorations demonstrate the feasibility and potential of Auto Research as a promising paradigm for self-improving, AI-driven research processes.
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
Prabhakar, Akshara, Liu, Zuxin, Zhu, Ming, Zhang, Jianguo, Awalgaonkar, Tulika, Wang, Shiyu, Liu, Zhiwei, Chen, Haolin, Hoang, Thai, Niebles, Juan Carlos, Heinecke, Shelby, Yao, Weiran, Wang, Huan, Savarese, Silvio, Xiong, Caiming
Training effective AI agents for multi-turn interactions requires high-quality data that captures realistic human-agent dynamics, yet such data is scarce and expensive to collect manually. We introduce APIGen-MT, a two-phase framework that generates verifiable and diverse multi-turn agent data. In the first phase, our agentic pipeline produces detailed task blueprints with ground-truth actions, leveraging a committee of LLM reviewers and iterative feedback loops. These blueprints are then transformed into complete interaction trajectories through simulated human-agent interplay. We train a family of models -- the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters. Our models outperform frontier models such as GPT-4o and Claude 3.5 on $ฯ$-bench and BFCL benchmarks, with the smaller models surpassing their larger counterparts, particularly in multi-turn settings, while maintaining superior consistency across multiple trials. Comprehensive experiments demonstrate that our verified blueprint-to-details approach yields high-quality training data, enabling the development of more reliable, efficient, and capable agents. We open-source 5K synthetic data trajectories and the trained xLAM-2-fc-r models to advance research in AI agents. Models at https://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4; Dataset at https://huggingface.co/datasets/Salesforce/APIGen-MT-5k and Website at https://apigen-mt.github.io
Preference-based Multi-Objective Reinforcement Learning
Mu, Ni, Luan, Yao, Jia, Qing-Shan
Multi-objective reinforcement learning (MORL) is a structured approach for optimizing tasks with multiple objectives. However, it often relies on pre-defined reward functions, which can be hard to design for balancing conflicting goals and may lead to oversimplification. Preferences can serve as more flexible and intuitive decision-making guidance, eliminating the need for complicated reward design. This paper introduces preference-based MORL (Pb-MORL), which formalizes the integration of preferences into the MORL framework. We theoretically prove that preferences can derive policies across the entire Pareto frontier. To guide policy optimization using preferences, our method constructs a multi-objective reward model that aligns with the given preferences. We further provide theoretical proof to show that optimizing this reward model is equivalent to training the Pareto optimal policy. Extensive experiments in benchmark multi-objective tasks, a multi-energy management task, and an autonomous driving task on a multi-line highway show that our method performs competitively, surpassing the oracle method, which uses the ground truth reward function. This highlights its potential for practical applications in complex real-world systems.
Byzantine-resilient federated online learning for Gaussian process regression
Zhang, Xu, Yuan, Zhenyuan, Zhu, Minghui
In this paper, we study Byzantine-resilient federated online learning for Gaussian process regression (GPR). We develop a Byzantine-resilient federated GPR algorithm that allows a cloud and a group of agents to collaboratively learn a latent function and improve the learning performances where some agents exhibit Byzantine failures, i.e., arbitrary and potentially adversarial behavior. Each agent-based local GPR sends potentially compromised local predictions to the cloud, and the cloud-based aggregated GPR computes a global model by a Byzantine-resilient product of experts aggregation rule. Then the cloud broadcasts the current global model to all the agents. Agent-based fused GPR refines local predictions by fusing the received global model with that of the agent-based local GPR. Moreover, we quantify the learning accuracy improvements of the agent-based fused GPR over the agent-based local GPR. Experiments on a toy example and two medium-scale real-world datasets are conducted to demonstrate the performances of the proposed algorithm.
A Minimalist Controller for Autonomously Self-Aggregating Robotic Swarms: Enabling Compact Formations in Multitasking Scenarios
de Macedo, Maria Eduarda Silva, de Souza, Ana Paula Chiarelli, Rosso, Roberto Silvio Ubertino Jr., Lopes, Yuri Kaszubowski
The deployment of simple emergent behaviors in swarm robotics has been well-rehearsed in the literature. A recent study has shown how self-aggregation is possible in a multitask approach -- where multiple self-aggregation task instances occur concurrently in the same environment. The multitask approach poses new challenges, in special, how the dynamic of each group impacts the performance of others. So far, the multitask self-aggregation of groups of robots suffers from generating a circular formation -- that is not fully compact -- or is not fully autonomous. In this paper, we present a multitask self-aggregation where groups of homogeneous robots sort themselves into different compact clusters, relying solely on a line-of-sight sensor. Our multitask self-aggregation behavior was able to scale well and achieve a compact formation. We report scalability results from a series of simulation trials with different configurations in the number of groups and the number of robots per group. We were able to improve the multitask self-aggregation behavior performance in terms of the compactness of the clusters, keeping the proportion of clustered robots found in other studies.