Agents
Personality-Driven Decision-Making in LLM-Based Autonomous Agents
Newsham, Lewis, Prince, Daniel
The embedding of Large Language Models (LLMs) into autonomous agents is a rapidly developing field which enables dynamic, configurable behaviours without the need for extensive domain-specific training. In our previous work, we introduced SANDMAN, a Deceptive Agent architecture leveraging the Five-Factor OCEAN personality model, demonstrating that personality induction significantly influences agent task planning. Building on these findings, this study presents a novel method for measuring and evaluating how induced personality traits affect task selection processes - specifically planning, scheduling, and decision-making - in LLM-based agents. Our results reveal distinct task-selection patterns aligned with induced OCEAN attributes, underscoring the feasibility of designing highly plausible Deceptive Agents for proactive cyber defense strategies.
Asynchronous Multi-Agent Systems with Petri nets
Adobbati, Federica, Mikulski, ลukasz
Modeling the interaction between components is crucial for many applications and serves as a fundamental step in analyzing and verifying properties in multi-agent systems. In this paper, we propose a method based on 1-safe Petri nets to model Asynchronous Multi-Agent Systems (AMAS), starting from two semantics defined on AMAS represented as transition systems. Specifically, we focus on two types of synchronization: synchronization on transitions and synchronization on data. For both, we define an operator that composes 1-safe Petri nets and demonstrate the relationships between the composed Petri net and the global transition systems as defined in theliterature. Additionally, we analyze the relationships between the two semantics on Petri nets, proposing two constructions that enable switching between them. These transformations are particularly useful for system analysis, as they allow the selection of the most suitable model based on the property that needs to be verified.
Provably Stable Multi-Agent Routing with Bounded-Delay Adversaries in the Decision Loop
Francos, Roee M., Garces, Daniel, Gil, Stephanie
-- In this work, we are interested in studying multi-agent routing settings, where adversarial agents are part of the assignment and decision loop, degrading the performance of the fleet by incurring bounded delays while servicing pickup-and-delivery requests. Specifically, we are interested in characterizing conditions on the fleet size and the proportion of adversarial agents for which a routing policy remains stable, where stability for a routing policy is achieved if the number of outstanding requests is uniformly bounded over time. T o obtain this characterization, we first establish a threshold on the proportion of adversarial agents above which previously stable routing policies for fully cooperative fleets are provably unstable. We then derive a sufficient condition on the fleet size to recover stability given a maximum proportion of adversarial agents. We empirically validate our theoretical results on a case study on autonomous taxi routing, where we consider transportation requests from real San Francisco taxicab data. In this paper we focus on a routing setting where a fleet of agents must pick up and deliver stochastically appearing requests. This stochastic setup is common in mobility-on-demand [1], [2], [3] and warehouse logistics [4], [5], where the location and quantity of future requests are unknown in advance. We assume that each agent handles one request at a time. In our setup, a subset of agents in the fleet may act adversarially by deviating from the prescribed plan set by the centralized control system, resulting in longer than expected service times for their assigned requests. This service delay model is inspired by operations research studies [6], particularly in transportation and delivery systems [7], [8], where drivers, after accepting a request, may pause for personal breaks or take longer routes to increase earnings when compensated per mile. We assume that if the agents take too long to service a request, then the system will remove them, hence agents can only incur a bounded delay. Hereafter we refer to this as the bounded-delay model for adversaries. Our objective in this paper is then to characterize conditions on the fleet size and the proportion of adversarial agents in the system for which a routing policy is provably stable in the presence of bounded delay adversarial agents, where a stable routing policy is one that guarantees that the number of outstanding requests is uniformly bounded over time.
Visual Environment-Interactive Planning for Embodied Complex-Question Answering
Lan, Ning, Ou, Baoshan, Xie, Xuemei, Shi, Guangming
--This study focuses on Embodied Complex-Question Answering task, which means the embodied robot need to understand human questions with intricate structures and abstract semantics. The core of this task lies in making appropriate plans based on the perception of the visual environment. Existing methods often generate plans in a once-for-all manner, i.e., one-step planning . Such approach rely on large models, without sufficient understanding of the environment. Considering multi-step planning, the framework for formulating plans in a sequential manner is proposed in this paper . T o ensure the ability of our framework to tackle complex questions, we create a structured semantic space, where hierarchical visual perception and chain expression of the question essence can achieve iterative interaction. This space makes sequential task planning possible. Within the framework, we first parse human natural language based on a visual hierarchical scene graph, which can clarify the intention of the question. Then, we incorporate external rules to make a plan for current step, weakening the reliance on large models. Every plan is generated based on feedback from visual perception, with multiple rounds of interaction until an answer is obtained. This approach enables continuous feedback and adjustment, allowing the robot to optimize its action strategy. T o test our framework, we contribute a new dataset with more complex questions. Experimental results demonstrate that our approach performs excellently and stably on complex tasks. And also, the feasibility of our approach in real-world scenarios has been established, indicating its practical applicability. Index T erms --Embodied complex-question answering, task planning, language parsing, structured semantic space. HE development of versatile embodied agents capable of understanding natural language commands in indoor environments and executing various tasks through visual interaction has been a long-standing goal.
Automated detection of atomicity violations in large-scale systems
He, Hang, Luo, Yixing, Wan, Chengcheng, Su, Ting, Sun, Haiying, Pu, Geguang
Atomicity violations in interrupt-driven programs pose a significant threat to software safety in critical systems. These violations occur when the execution sequence of operations on shared resources is disrupted by asynchronous interrupts. Detecting atomicity violations is challenging due to the vast program state space, application-level code dependencies, and complex domain-specific knowledge. We propose Clover, a hybrid framework that integrates static analysis with large language model (LLM) agents to detect atomicity violations in real-world programs. Clover first performs static analysis to extract critical code snippets and operation information. It then initiates a multi-agent process, where the expert agent leverages domain-specific knowledge to detect atomicity violations, which are subsequently validated by the judge agent. Evaluations on RaceBench 2.1, SV-COMP, and RWIP demonstrate that Clover achieves a precision/recall of 92.3%/86.6%, outperforming existing approaches by 27.4-118.2% on F1-score.
Scalable Safe Multi-Agent Reinforcement Learning for Multi-Agent System
Du, Haikuo, Gou, Fandi, Cai, Yunze
Safety and scalability are two critical challenges faced by practical Multi-Agent Systems (MAS). However, existing Multi-Agent Reinforcement Learning (MARL) algorithms that rely solely on reward shaping are ineffective in ensuring safety, and their scalability is rather limited due to the fixed-size network output. To address these issues, we propose a novel framework, Scalable Safe MARL (SS-MARL), to enhance the safety and scalability of MARL methods. Leveraging the inherent graph structure of MAS, we design a multi-layer message passing network to aggregate local observations and communications of varying sizes. Furthermore, we develop a constrained joint policy optimization method in the setting of local observation to improve safety. Simulation experiments demonstrate that SS-MARL achieves a better trade-off between optimality and safety compared to baselines, and its scalability significantly outperforms the latest methods in scenarios with a large number of agents.
Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents
Agashe, Saaket, Wong, Kyle, Tu, Vincent, Yang, Jiachen, Li, Ang, Wang, Xin Eric
Computer use agents automate digital tasks by directly interacting with graphical user interfaces (GUIs) on computers and mobile devices, offering significant potential to enhance human productivity by completing an open-ended space of user queries. However, current agents face significant challenges: imprecise grounding of GUI elements, difficulties with long-horizon task planning, and performance bottlenecks from relying on single generalist models for diverse cognitive tasks. To this end, we introduce Agent S2, a novel compositional framework that delegates cognitive responsibilities across various generalist and specialist models. We propose a novel Mixture-of-Grounding technique to achieve precise GUI localization and introduce Proactive Hierarchical Planning, dynamically refining action plans at multiple temporal scales in response to evolving observations. Evaluations demonstrate that Agent S2 establishes new state-of-the-art (SOTA) performance on three prominent computer use benchmarks. Specifically, Agent S2 achieves 18.9% and 32.7% relative improvements over leading baseline agents such as Claude Computer Use and UI-TARS on the OSWorld 15-step and 50-step evaluation. Moreover, Agent S2 generalizes effectively to other operating systems and applications, surpassing previous best methods by 52.8% on WindowsAgentArena and by 16.52% on AndroidWorld relatively. Code available at https://github.com/simular-ai/Agent-S.
First Field-Trial Demonstration of L4 Autonomous Optical Network for Distributed AI Training Communication: An LLM-Powered Multi-AI-Agent Solution
Zhang, Yihao, Qiu, Qizhi, Liu, Xiaomin, Fu, Dianxuan, Liu, Xingyu, Fei, Leyan, Cheng, Yuming, Yi, Lilin, Hu, Weisheng, Zhuge, Qunbi
Abstract: We demonstrate the first cross - domain cross - layer level - 4 autonomous optical network via a multi - AI - agent system. Field trials show ~ 9 8 % task completion rate across the distributed AI training lifecycle -- 3.2 higher than single agents using state - of - the - art LLMs. Since collaborative resource utilization across distributed facilities is essential for training workloads, t his evolution introduces significant complexity in network management, as controller s must operate across multiple domains, spanning from intra - and inter - datacenter s to long - haul wide area networks . Moreover, distributed training impose s stringent reliability requirements as it should restart from the checkpoint if a failure happens [ 2 ] . T herefore, in terms of distributed training communications, resilient operations and rapid fault recovery are essential .
Remember, but also, Forget: Bridging Myopic and Perfect Recall Fairness with Past-Discounting
Dynamic resource allocation in multi-agent settings often requires balancing efficiency with fairness over time--a challenge inadequately addressed by conventional, myopic fairness measures. Motivated by behavioral insights that human judgments of fairness evolve with temporal distance, we introduce a novel framework for temporal fairness that incorporates past-discounting mechanisms. By applying a tunable discount factor to historical utilities, our approach interpolates between instantaneous and perfect-recall fairness, thereby capturing both immediate outcomes and long-term equity considerations. Beyond aligning more closely with human perceptions of fairness, this past-discounting method ensures that the augmented state space remains bounded, significantly improving computational tractability in sequential decision-making settings. We detail the formulation of discounted-recall fairness in both additive and averaged utility contexts, illustrate its benefits through practical examples, and discuss its implications for designing balanced, scalable resource allocation strategies.
HERA: Hybrid Edge-cloud Resource Allocation for Cost-Efficient AI Agents
Liu, Shiyi, Shen, Haiying, Che, Shuai, Ghandi, Mahdi, Li, Mingqin
In the realm of AI, large language models (LLMs) like GPT-4, central to the operation of AI agents, predominantly operate in the cloud, incurring high operational costs. With local-based small language models (SLMs) becoming more accurate, the necessity of cloud-exclusive processing is being reconsidered. An AI agent's response to a user's request comprises a series of subtasks or iterations. Existing approaches only allocate a single request between SLM and LLM to ensure their outputs are similar, but adopting this approach in the AI agent scenario for assigning each subtask is not effective since SLM will output a different subsequent subtask, which affects the accuracy of the final output. In this paper, we first conduct experimental analysis to understand the features of AI agent operations. Leveraging our findings, we propose the Adaptive Iteration-level Model Selector (AIMS), a lightweight scheduler to automatically partition AI agent's subtasks between local-based SLM and cloud-based LLM. AIMS considers the varying subtask features and strategically decides the location for each subtask in order to use SLM as much as possible while attaining the accuracy level. Our experimental results demonstrate that AIMS increases accuracy by up to 9.1% and SLM usage by up to 10.8% compared to HybridLLM. It offloads 45.67% of subtasks to a local SLM while attaining similar accuracy on average compared with the cloud-only LLM approach.