Agents
AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing
Ren, Yinwang, Liu, Yangyang, Ji, Tang, Xu, Xun
AI agents are autonomous systems designed to perceive, reason, and act within dynamic environments. With the rapid advancements in generative AI (GenAI), large language models (LLMs) and multimodal large language models (MLLMs) have significantly improved AI agents' capabilities in semantic comprehension, complex reasoning, and autonomous decision-making. At the same time, the rise of Agentic AI highlights adaptability and goal-directed autonomy in dynamic and complex environments. LLMs-based AI Agents (LLM-Agents), MLLMs-based AI Agents (MLLM-Agents), and Agentic AI contribute to expanding AI's capabilities in information processing, environmental perception, and autonomous decision-making, opening new avenues for smart manufacturing. However, the definitions, capability boundaries, and practical applications of these emerging AI paradigms in smart manufacturing remain unclear. To address this gap, this study systematically reviews the evolution of AI and AI agent technologies, examines the core concepts and technological advancements of LLM-Agents, MLLM-Agents, and Agentic AI, and explores their potential applications in and integration into manufacturing, along with the potential challenges they may face. Preprint submitted to Journal of Manufacturing System July 3, 2025 1. Introduction As a complex and data-intensive domain, manufacturing faces increasing challenges due to the increasing demand for customization, shorter product life cycles, and intense global competition [1, 2]. Traditional automated systems, reliant on fixed rules, struggle to adapt to evolving customer needs.
BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments
Qiu, Yibo, Huang, Zan, Wang, Zhiyu, Liu, Handi, Qiao, Yiling, Hu, Yifeng, Sun, Shu'ang, Peng, Hangke, Xu, Ronald X, Sun, Mingzhai
Large language models (LLMs) and vision-language models (VLMs) have the potential to transform biological research by enabling autonomous experimentation. Yet, their application remains constrained by rigid protocol design, limited adaptability to dynamic lab conditions, inadequate error handling, and high operational complexity. Here we introduce BioMARS (Biological Multi-Agent Robotic System), an intelligent platform that integrates LLMs, VLMs, and modular robotics to autonomously design, plan, and execute biological experiments. BioMARS uses a hierarchical architecture: the Biologist Agent synthesizes protocols via retrieval-augmented generation; the Technician Agent translates them into executable robotic pseudo-code; and the Inspector Agent ensures procedural integrity through multimodal perception and anomaly detection. The system autonomously conducts cell passaging and culture tasks, matching or exceeding manual performance in viability, consistency, and morphological integrity. It also supports context-aware optimization, outperforming conventional strategies in differentiating retinal pigment epithelial cells. A web interface enables real-time human-AI collaboration, while a modular backend allows scalable integration with laboratory hardware. These results highlight the feasibility of generalizable, AI-driven laboratory automation and the transformative role of language-based reasoning in biological research.
Optimal Dispersion Under Asynchrony
Pattanayak, Debasish, Kshemkalyani, Ajay D., Kumar, Manish, Molla, Anisur Rahaman, Sharma, Gokarna
The problem of dispersion of mobile agents studied extensively in recent distributed computing literature not only takes its inspiration from biological phenomena, such as damselfish establishing non-overlapping territories on coral reefs [6], or neural crest cells migrating and distributing themselves across the developing embryo [28]; but also with practical applications such as placing a fleet of small autonomous robots (agents) under shelves (nodes) in fulfillment centers [41]. It is also closely connected to other coordination tasks such as exploration, scattering, load balancing, and self-deployment [5, 10, 12, 14, 16, 39]. The dispersion problem, denoted as dispersion, involves k n mobile agents placed initially arbitrarily on the nodes of an n -node anonymous graph of maximum degree . The goal for the agents is to autonomously relocate such that each agent is on a distinct node of the graph (see Figure 1). The objective is to design algorithms that simultaneously optimize time and memory complexities. Time complexity is the total time required to achieve dispersion starting from any initial configuration. Memory complexity is the maximum number of bits stored in the persistent memory at each agent throughout the execution. We stress that graph nodes are memory-less and cannot store any information.
Zero-Incentive Dynamics: a look at reward sparsity through the lens of unrewarded subgoals
Molinghen, Yannick, Lenaerts, Tom
This work re-examines the commonly held assumption that the frequency of rewards is a reliable measure of task difficulty in reinforcement learning. We identify and formalize a structural challenge that undermines the effectiveness of current policy learning methods: when essential subgoals do not directly yield rewards. We characterize such settings as exhibiting zero-incentive dynamics, where transitions critical to success remain unrewarded. We show that state-of-the-art deep subgoal-based algorithms fail to leverage these dynamics and that learning performance is highly sensitive to the temporal proximity between subgoal completion and eventual reward. These findings reveal a fundamental limitation in current approaches and point to the need for mechanisms that can infer latent task structure without relying on immediate incentives.
Data Agent: A Holistic Architecture for Orchestrating Data+AI Ecosystems
Sun, Zhaoyan, Wang, Jiayi, Zhao, Xinyang, Wang, Jiachi, Li, Guoliang
Traditional Data+AI systems utilize data-driven techniques to optimize performance, but they rely heavily on human experts to orchestrate system pipelines, enabling them to adapt to changes in data, queries, tasks, and environments. For instance, while there are numerous data science tools available, developing a pipeline planning system to coordinate these tools remains challenging. This difficulty arises because existing Data+AI systems have limited capabilities in semantic understanding, reasoning, and planning. Fortunately, we have witnessed the success of large language models (LLMs) in enhancing semantic understanding, reasoning, and planning abilities. It is crucial to incorporate LLM techniques to revolutionize data systems for orchestrating Data+AI applications effectively. To achieve this, we propose the concept of a 'Data Agent' - a comprehensive architecture designed to orchestrate Data+AI ecosystems, which focuses on tackling data-related tasks by integrating knowledge comprehension, reasoning, and planning capabilities. We delve into the challenges involved in designing data agents, such as understanding data/queries/environments/tools, orchestrating pipelines/workflows, optimizing and executing pipelines, and fostering pipeline self-reflection. Furthermore, we present examples of data agent systems, including a data science agent, data analytics agents (such as unstructured data analytics agent, semantic structured data analytics agent, data lake analytics agent, and multi-modal data analytics agent), and a database administrator (DBA) agent. We also outline several open challenges associated with designing data agent systems.
Optimizing Conversational Product Recommendation via Reinforcement Learning
We propose a reinforcement learning-based approach to optimize conversational strategies for product recommendation across diverse industries. As organizations increasingly adopt intelligent agents to support sales and service operations, the effectiveness of a conversation hinges not only on what is recommended but how and when recommendations are delivered. We explore a methodology where agentic systems learn optimal dialogue policies through feedback-driven reinforcement learning. By mining aggregate behavioral patterns and conversion outcomes, our approach enables agents to refine talk tracks that drive higher engagement and product uptake, while adhering to contextual and regulatory constraints. We outline the conceptual framework, highlight key innovations, and discuss the implications for scalable, personalized recommendation in enterprise environments.
AdamMeme: Adaptively Probe the Reasoning Capacity of Multimodal Large Language Models on Harmfulness
Chen, Zixin, Lin, Hongzhan, Li, Kaixin, Luo, Ziyang, Ye, Zhen, Chen, Guang, Huang, Zhiyong, Ma, Jing
The proliferation of multimodal memes in the social media era demands that multimodal Large Language Models (mLLMs) effectively understand meme harmfulness. Existing benchmarks for assessing mLLMs on harmful meme understanding rely on accuracy-based, model-agnostic evaluations using static datasets. These benchmarks are limited in their ability to provide up-to-date and thorough assessments, as online memes evolve dynamically. To address this, we propose AdamMeme, a flexible, agent-based evaluation framework that adaptively probes the reasoning capabilities of mLLMs in deciphering meme harmfulness. Through multi-agent collaboration, AdamMeme provides comprehensive evaluations by iteratively updating the meme data with challenging samples, thereby exposing specific limitations in how mLLMs interpret harmfulness. Extensive experiments show that our framework systematically reveals the varying performance of different target mLLMs, offering in-depth, fine-grained analyses of model-specific weaknesses. Our code is available at https://github.com/Lbotirx/AdamMeme.
What does really matter in image goal navigation?
Monaci, Gianluca, Weinzaepfel, Philippe, Wolf, Christian
Image goal navigation requires two different skills: firstly, core navigation skills, including the detection of free space and obstacles, and taking decisions based on an internal representation; and secondly, computing directional information by comparing visual observations to the goal image. Current state-of-the-art methods either rely on dedicated image-matching, or pre-training of computer vision modules on relative pose estimation. In this paper, we study whether this task can be efficiently solved with end-to-end training of full agents with RL, as has been claimed by recent work. A positive answer would have impact beyond Embodied AI and allow training of relative pose estimation from reward for navigation alone. In this large experimental study we investigate the effect of architectural choices like late fusion, channel stacking, space-to-depth projections and cross-attention, and their role in the emergence of relative pose estimators from navigation training. W e show that the success of recent methods is influenced up to a certain extent by simulator settings, leading to shortcuts in simulation. However, we also show that these capabilities can be transferred to more realistic setting, up to some extent. W e also find evidence for correlations between navigation performance and probed (emerging) relative pose estimation performance, an important sub skill.
Approximation-free Control of Unknown Euler-Lagrangian Systems under Input Constraints
Das, Ratnangshu, Jagtap, Pushpak
In this paper, we present a novel funnel-based tracking control algorithm for robotic systems with unknown dynamics and prescribed input constraints. The Euler-Lagrange formulation, a common modeling approach for robotic systems, has been adopted in this study to address the trade-off between performance and actuator safety. We establish feasibility conditions that ensure tracking errors evolve within predefined funnel bounds while maintaining bounded control efforts, a crucial consideration for robots with limited actuation capabilities. We propose two approximation-free control strategies for scenarios where these conditions are violated: one actively corrects the error, and the other stops further deviation. Finally, we demonstrate the robust performance and safety of the approach through simulations and experimental validations. This work represents a significant advancement in funnel-based control, enhancing its applicability to real-world robotics systems with input constraints.
Evaluating LLM Agent Collusion in Double Auctions
Agrawal, Kushal, Teo, Verona, Vazquez, Juan J., Kunnavakkam, Sudarsh, Srikanth, Vishak, Liu, Andy
Large language models (LLMs) have demonstrated impressive capabilities as autonomous agents with rapidly expanding applications in various domains. As these agents increasingly engage in socioeconomic interactions, identifying their potential for undesirable behavior becomes essential. In this work, we examine scenarios where they can choose to collude, defined as secretive cooperation that harms another party. To systematically study this, we investigate the behavior of LLM agents acting as sellers in simulated continuous double auction markets. Through a series of controlled experiments, we analyze how parameters such as the ability to communicate, choice of model, and presence of environmental pressures affect the stability and emergence of seller collusion. We find that direct seller communication increases collusive tendencies, the propensity to collude varies across models, and environmental pressures, such as oversight and urgency from authority figures, influence collusive behavior. Our findings highlight important economic and ethical considerations for the deployment of LLM-based market agents.