Agents
Eval-PPO: Building an Efficient Threat Evaluator Using Proximal Policy Optimization
Sun, Wuzhou, Li, Siyi, Zou, Qingxiang, Liao, Zixing
In various game scenarios, selecting a fixed number of targets from multiple enemy units is an extremely challenging task. This difficulty stems from the complex relationship between the threat levels of enemy units and their feature characteristics, which complicates the design of rule-based evaluators. Moreover, traditional supervised learning methods face the challenge of lacking explicit labels during training when applied to this threat evaluation problem. In this study, we redefine the threat evaluation problem as a reinforcement learning task and introduce an efficient evaluator training algorithm, Eval-PPO, based on the Proximal Policy Optimization (PPO) algorithm. Eval-PPO integrates multidimensional enemy features and the state information of friendly units through systematic training, thereby achieving precise threat assessment. Compared with rule-based methods, Eval-PPO demonstrates a significant improvement in average success rate, with an increase of 17.84%.
Formation Control of Multi-agent System with Local Interaction and Artificial Potential Field
Zhao, Luoyin, Yan, Zheping, Wang, Yuqing, Yeow, Raye Chen-Hua
A novel local interaction control method (LICM) is proposed in this paper to realize the formation control of multi-agent system (MAS). A local interaction leader follower (LILF) structure is provided by coupling the advantages of information consensus and leader follower frame, the agents can obtain the state information of the leader by interacting with their neighbours, which will reduce the communication overhead of the system and the dependence on a single node of the topology. In addition, the artificial potential field (APF) method is introduced to achieve obstacle avoidance and collision avoidance between agents. Inspired by the stress response of animals, a stress response mechanism-artificial potential field (SRM-APF) is proposed, which will be triggered when the local minimum problem of APF occurs. Ultimately, the simulation experiments of three formation shapes, including triangular formation, square formation and hexagonal formation, validate the effectiveness of the proposed method.
Multi-Agent Systems Execute Arbitrary Malicious Code
Triedman, Harold, Jha, Rishi, Shmatikov, Vitaly
Multi-agent systems coordinate LLM-based agents to perform tasks on users' behalf. In real-world applications, multi-agent systems will inevitably interact with untrusted inputs, such as malicious Web content, files, email attachments, etc. Using several recently proposed multi-agent frameworks as concrete examples, we demonstrate that adversarial content can hijack control and communication within the system to invoke unsafe agents and functionalities. This results in a complete security breach, up to execution of arbitrary malicious code on the user's device and/or exfiltration of sensitive data from the user's containerized environment. We show that control-flow hijacking attacks succeed even if the individual agents are not susceptible to direct or indirect prompt injection, and even if they refuse to perform harmful actions.
Automation and Feature Selection Enhancement with Reinforcement Learning (RL)
Effective feature selection, representation and transformation are principal steps in machine learning to improve prediction accuracy, model generalization and computational efficiency. Reinforcement learning provides a new perspective towards balanced exploration of optimal feature subset using multi-agent[1] and single-agent models. Interactive reinforcement learning integrated with decision tree improves feature knowledge, state representation and selection efficiency, while diversified teaching strategies improve both selection quality and efficiency. The state representation can further be enhanced by scanning features sequentially along with the usage of convolutional auto-encoder[2]. Monte Carlo-based reinforced feature selection(MCRFS)[3], a single-agent feature selection method reduces computational burden by incorporating early-stopping and reward-level interactive strategies. A dual-agent RL framework[4] is also introduced that collectively selects features and instances, capturing the interactions between them. This enables the agents to navigate through complex data spaces. To outperform the traditional feature engineering, cascading reinforced agents are used to iteratively improve the feature space, which is a self-optimizing framework[5]. The blend of reinforcement learning, multi-agent systems, and bandit-based approaches offers exciting paths for studying scalable and interpretable machine learning solutions to handle high-dimensional data and challenging predictive tasks.
ICCO: Learning an Instruction-conditioned Coordinator for Language-guided Task-aligned Multi-robot Control
Yano, Yoshiki, Shibata, Kazuki, Kokshoorn, Maarten, Matsubara, Takamitsu
Recent advances in Large Language Models (LLMs) have permitted the development of language-guided multi-robot systems, which allow robots to execute tasks based on natural language instructions. However, achieving effective coordination in distributed multi-agent environments remains challenging due to (1) misalignment between instructions and task requirements and (2) inconsistency in robot behaviors when they independently interpret ambiguous instructions. To address these challenges, we propose Instruction-Conditioned Coordinator (ICCO), a Multi-Agent Reinforcement Learning (MARL) framework designed to enhance coordination in language-guided multi-robot systems. ICCO consists of a Coordinator agent and multiple Local Agents, where the Coordinator generates Task-Aligned and Consistent Instructions (TACI) by integrating language instructions with environmental states, ensuring task alignment and behavioral consistency. The Coordinator and Local Agents are jointly trained to optimize a reward function that balances task efficiency and instruction following. A Consistency Enhancement Term is added to the learning objective to maximize mutual information between instructions and robot behaviors, further improving coordination. Simulation and real-world experiments validate the effectiveness of ICCO in achieving language-guided task-aligned multi-robot control. The demonstration can be found at https://yanoyoshiki.github.io/ICCO/.
Exploration of VLMs for Driver Monitoring Systems Applications
Caรฑas, Paola Natalia, Nieto, Marcos, Otaegui, Oihana, Rodrรญguez, Igor
VLMs have the potential to revolutionize driver and in-cabin monitoring by offering a more holistic understanding of the driving scene. Rather than focusing on individual variables, VLMs are trained to describe the entire scene, considering all crucial elements. This comprehensive approach allows them to construct a coherent narrative around the scene, leading to a more thorough assessment of the driver's situation. Despite the potential benefits, there is a notable lack of scientific research exploring the application of VLMs in this field. We aim to conduct an initial exploration of how these systems perform in tasks such as distraction detection, drowsiness detection, and gaze estimation. By evaluating their performance, we hope to determine whether they can match or even surpass state-of-the-art models, or identify areas where they fall short. To achieve this, we will utilize data from the Driver Monitoring Dataset (DMD), which contains extensive material of drivers in various states of drowsiness and distraction containing drivers doing several actions that imply distraction like texting, having a phone call, drinking water, besides driving safely, as well as detailed gaze annotations. By integrating VLMs into DMS, we expect the model to: Have better scene comprehension, enabling it to provide detailed descriptions and respond to queries through Visual Question Answering (VQA) tasks.
Ergodic exploration of dynamic distribution
Lanฤa, Luka, Jakac, Karlo, Calinon, Sylvain, Iviฤ, Stefan
This research addresses the challenge of performing search missions in dynamic environments, particularly for drifting targets whose movement is dictated by a flow field. This is accomplished through a dynamical system that integrates two partial differential equations: one governing the dynamics and uncertainty of the probability distribution, and the other regulating the potential field for ergodic multi-agent search. The target probability field evolves in response to the target dynamics imposed by the environment and accomplished sensing efforts, while being explored by multiple robot agents guided by the potential field gradient. The proposed methodology was tested on two simulated search scenarios, one of which features a synthetically generated domain and showcases better performance when compared to the baseline method with static target probability over a range of agent to flow field velocity ratios. The second search scenario represents a realistic sea search and rescue mission where the search start is delayed, the search is performed in multiple robot flight missions, and the procedure for target drift uncertainty compensation is demonstrated. Furthermore, the proposed method provides an accurate survey completion metric, based on the known detection/sensing parameters, that correlates with the actual number of targets found independently.
Revisiting FastMap: New Applications
FastMap was first introduced in the Data Mining community for generating Euclidean embeddings of complex objects. In this dissertation, we first present FastMap to generate Euclidean embeddings of graphs in near-linear time: The pairwise Euclidean distances approximate a desired graph-based distance function on the vertices. We then apply the graph version of FastMap to efficiently solve various graph-theoretic problems of significant interest in AI: including facility location, top-K centrality computations, community detection and block modeling, and graph convex hull computations. We also present a novel learning framework, called FastMapSVM, by combining FastMap and Support Vector Machines. We then apply FastMapSVM to predict the satisfiability of Constraint Satisfaction Problems and to classify seismograms in Earthquake Science.
DeskVision: Large Scale Desktop Region Captioning for Advanced GUI Agents
Xu, Yibin, Yang, Liang, Chen, Hao, Wang, Hua, Chen, Zhi, Tang, Yaohua
The limitation of graphical user interface (GUI) data has been a significant barrier to the development of GUI agents today, especially for the desktop / computer use scenarios. To address this, we propose an automated GUI data generation pipeline, AutoCaptioner, which generates data with rich descriptions while minimizing human effort. Using AutoCaptioner, we created a novel large-scale desktop GUI dataset, DeskVision, along with the largest desktop test benchmark, DeskVision-Eval, which reflects daily usage and covers diverse systems and UI elements, each with rich descriptions. With DeskVision, we train a new GUI understanding model, GUIExplorer. Results show that GUIExplorer achieves state-of-the-art (SOTA) performance in understanding/grounding visual elements without the need for complex architectural designs. We further validated the effectiveness of the DeskVision dataset through ablation studies on various large visual language models (LVLMs). We believe that AutoCaptioner and DeskVision will significantly advance the development of GUI agents, and will open-source them for the community.
Unicorn: A Universal and Collaborative Reinforcement Learning Approach Towards Generalizable Network-Wide Traffic Signal Control
Zhang, Yifeng, Liu, Yilin, Gong, Ping, Li, Peizhuo, Fan, Mingfeng, Sartoretti, Guillaume
Adaptive traffic signal control (ATSC) is crucial in reducing congestion, maximizing throughput, and improving mobility in rapidly growing urban areas. Recent advancements in parameter-sharing multi-agent reinforcement learning (MARL) have greatly enhanced the scalable and adaptive optimization of complex, dynamic flows in large-scale homogeneous networks. However, the inherent heterogeneity of real-world traffic networks, with their varied intersection topologies and interaction dynamics, poses substantial challenges to achieving scalable and effective ATSC across different traffic scenarios. To address these challenges, we present Unicorn, a universal and collaborative MARL framework designed for efficient and adaptable network-wide ATSC. Specifically, we first propose a unified approach to map the states and actions of intersections with varying topologies into a common structure based on traffic movements. Next, we design a Universal Traffic Representation (UTR) module with a decoder-only network for general feature extraction, enhancing the model's adaptability to diverse traffic scenarios. Additionally, we incorporate an Intersection Specifics Representation (ISR) module, designed to identify key latent vectors that represent the unique intersection's topology and traffic dynamics through variational inference techniques. To further refine these latent representations, we employ a contrastive learning approach in a self-supervised manner, which enables better differentiation of intersection-specific features. Moreover, we integrate the state-action dependencies of neighboring agents into policy optimization, which effectively captures dynamic agent interactions and facilitates efficient regional collaboration. Our results show that Unicorn outperforms other methods across various evaluation metrics, highlighting its potential in complex, dynamic traffic networks.