Goto

Collaborating Authors

 Agents


Diverse and Adaptive Behavior Curriculum for Autonomous Driving: A Student-Teacher Framework with Multi-Agent RL

arXiv.org Artificial Intelligence

Autonomous driving faces challenges in navigating complex real-world traffic, requiring safe handling of both common and critical scenarios. Reinforcement learning (RL), a prominent method in end-to-end driving, enables agents to learn through trial and error in simulation. However, RL training often relies on rule-based traffic scenarios, limiting generalization. Additionally, current scenario generation methods focus heavily on critical scenarios, neglecting a balance with routine driving behaviors. Curriculum learning, which progressively trains agents on increasingly complex tasks, is a promising approach to improving the robustness and coverage of RL driving policies. However, existing research mainly emphasizes manually designed curricula, focusing on scenery and actor placement rather than traffic behavior dynamics. This work introduces a novel student-teacher framework for automatic curriculum learning. The teacher, a graph-based multi-agent RL component, adaptively generates traffic behaviors across diverse difficulty levels. An adaptive mechanism adjusts task difficulty based on student performance, ensuring exposure to behaviors ranging from common to critical. The student, though exchangeable, is realized as a deep RL agent with partial observability, reflecting real-world perception constraints. Results demonstrate the teacher's ability to generate diverse traffic behaviors. The student, trained with automatic curricula, outperformed agents trained on rule-based traffic, achieving higher rewards and exhibiting balanced, assertive driving.


Game-Theoretic Gradient Control for Robust Neural Network Training

arXiv.org Artificial Intelligence

Feed-forward neural networks (FFNNs) are vulnerable to input noise, reducing prediction performance. Existing regularization methods like dropout often alter network architecture or overlook neuron interactions. This study aims to enhance FFNN noise robustness by modifying backpropagation, interpreted as a multi-agent game, and exploring controlled target variable noising. Our "gradient dropout" selectively nullifies hidden layer neuron gradients with probability 1 - p during backpropagation, while keeping forward passes active. This is framed within compositional game theory. Additionally, target variables were perturbed with white noise or stable distributions. Experiments on ten diverse tabular datasets show varying impacts: improvement or diminishing of robustness and accuracy, depending on dataset and hyperparameters. Notably, on regression tasks, gradient dropout (p = 0.9) combined with stable distribution target noising significantly increased input noise robustness, evidenced by flatter MSE curves and more stable SMAPE values. These results highlight the method's potential, underscore the critical role of adaptive parameter tuning, and open new avenues for analyzing neural networks as complex adaptive systems exhibiting emergent behavior within a game-theoretic framework.


OS-MAP: How Far Can Computer-Using Agents Go in Breadth and Depth?

arXiv.org Artificial Intelligence

Computer-using agents have shown strong potential to boost human productivity and enable new application forms across platforms. While recent advances have led to usable applications, existing benchmarks fail to account for the internal task heterogeneity and the corresponding agent capabilities, as well as their alignment with actual user demands-hindering both targeted capability development and the reliable transition of research progress into practical deployment. To bridge the gap, we present OS-MAP, a benchmark for daily computer-using automation that organizes its 416 realistic tasks across 15 applications along two key dimensions: a five-level taxonomy of automation and a generalization scope derived from a real-world user demand hierarchy. To enable fine-grained analysis of required capabilities and alignment with real-world scenarios, OS-MAP evaluates agents along two dimensions: automation level across a five-level taxonomy, and generalization scope across a demand hierarchy. This design captures varying levels of required agent autonomy and generalization, forming a performance-generalization evaluation matrix for structured and comprehensive assessment. Experiments show that even State-of-the-Art agents with VLM backbones struggle with higher-level tasks involving perception, reasoning, and coordination-highlighting the need for a deeper understanding of current strengths and limitations to drive the future progress in computer-using agents research and deployment. All code, environments, baselines, and data are publicly available at https://github.com/OS-Copilot/OS-Map.


Heterogeneous Risk Management Using a Multi-Agent Framework for Supply Chain Disruption Response

arXiv.org Artificial Intelligence

In the highly complex and stochastic global, supply chain environments, local enterprise agents seek distributed and dynamic strategies for agile responses to disruptions. Existing literature explores both centralized and distributed approaches, while most work neglects temporal dynamics and the heterogeneity of the risk management of individual agents. To address this gap, this letter presents a heterogeneous risk management mechanism to incorporate uncertainties and risk attitudes into agent communication and decision-making strategy. Hence, this approach empowers enterprises to handle disruptions in stochastic environments in a distributed way, and in particular in the context of multi-agent control and management. Through a simulated case study, we showcase the feasibility and effectiveness of the proposed approach under stochastic settings and how the decision of disruption responses changes when agents hold various risk attitudes.


Dynamic distributed decision-making for resilient resource reallocation in disrupted manufacturing systems

arXiv.org Artificial Intelligence

The COVID-19 pandemic brings many unexpected disruptions, such as frequently shifting markets and limited human workforce, to manufacturers. To stay competitive, flexible and real-time manufacturing decision-making strategies are needed to deal with such highly dynamic manufacturing environments. One essential problem is dynamic resource allocation to complete production tasks, especially when a resource disruption (e.g., machine breakdown) occurs. Though multi-agent methods have been proposed to solve the problem in a flexible and agile manner, the agent internal decision-making process and resource uncertainties have rarely been studied. This work introduces a model-based resource agent (RA) architecture that enables effective agent coordination and dynamic agent decision-making. Based on the RA architecture, a rescheduling strategy that incorporates risk assessment via a clustering agent coordination strategy is also proposed. A simulation-based case study is implemented to demonstrate dynamic rescheduling using the proposed multi-agent framework. The results show that the proposed method reduces the computational efforts while losing some throughput optimality compared to the centralized method. Furthermore, the case study illustrates that incorporating risk assessment into rescheduling decision-making improves the throughput.


A Distributed Approach for Agile Supply Chain Decision-Making Based on Network Attributes

arXiv.org Artificial Intelligence

In recent years, the frequent occurrence of disruptions has had a negative impact on global supply chains. To stay competitive, enterprises strive to remain agile through the implementation of efficient and effective decision-making strategies in reaction to disruptions. A significant effort has been made to develop these agile disruption mitigation approaches, leveraging both centralized and distributed decision-making strategies. Though trade-offs of centralized and distributed approaches have been analyzed in existing studies, no related work has been found on understanding supply chain performance based on the network attributes of the disrupted supply chain entities. In this paper, we characterize supply chains from a capability and network topological perspective and investigate the use of a distributed decision-making approach based on classical multi-agent frameworks. The performance of the distributed framework is evaluated through a comprehensive case study that investigates the performance of the supply chain as a function of the network structure and agent attributes within the network in the presence of a disruption. Comparison to a centralized decision-making approach highlights trade-offs between performance, computation time, and network communication based on the decision-making strategy and network architecture. Practitioners can use the outcomes of our studies to design response strategies based on agent capabilities, network attributes, and desired supply chain performance.


Agent0: Leveraging LLM Agents to Discover Multi-value Features from Text for Enhanced Recommendations

arXiv.org Artificial Intelligence

Large language models (LLMs) and their associated agent-based frameworks have significantly advanced automated information extraction, a critical component of modern recommender systems. While these multitask frameworks are widely used in code generation, their application in data-centric research is still largely untapped. This paper presents Agent0, an LLM-driven, agent-based system designed to automate information extraction and feature construction from raw, unstructured text. Categorical features are crucial for large-scale recommender systems but are often expensive to acquire. Agent0 coordinates a group of interacting LLM agents to automatically identify the most valuable text aspects for subsequent tasks (such as models or AutoML pipelines). Beyond its feature engineering capabilities, Agent0 also offers an automated prompt-engineering tuning method that utilizes dynamic feedback loops from an oracle. Our findings demonstrate that this closed-loop methodology is both practical and effective for automated feature discovery, which is recognized as one of the most challenging phases in current recommender system development.


GMM-Based Time-Varying Coverage Control

arXiv.org Artificial Intelligence

In coverage control problems that involve time-varying density functions, the coverage control law depends on spatial integrals of the time evolution of the density function. The latter is often neglected, replaced with an upper bound or calculated as a numerical approximation of the spatial integrals involved. In this paper, we consider a special case of time-varying density functions modeled as Gaussian Mixture Models (GMMs) that evolve with time via a set of time-varying sources (with known corresponding velocities). By imposing this structure, we obtain an efficient time-varying coverage controller that fully incorporates the time evolution of the density function. We show that the induced trajectories under our control law minimise the overall coverage cost. We elicit the structure of the proposed controller and compare it with a classical time-varying coverage controller, against which we benchmark the coverage performance in simulation. Furthermore, we highlight that the computationally efficient and distributed nature of the proposed control law makes it ideal for multi-vehicle robotic applications involving time-varying coverage control problems. We employ our method in plume monitoring using a swarm of drones. In an experimental field trial we show that drones guided by the proposed controller are able to track a simulated time-varying chemical plume in a distributed manner.


REPRO-Bench: Can Agentic AI Systems Assess the Reproducibility of Social Science Research?

arXiv.org Artificial Intelligence

Assessing the reproducibility of social science papers is essential for promoting rigor in research processes, but manual assessment is costly. With recent advances in agentic AI systems (i.e., AI agents), we seek to evaluate their capability to automate this process. However, existing benchmarks for reproducing research papers (1) focus solely on reproducing results using provided code and data without assessing their consistency with the paper, (2) oversimplify real-world scenarios, and (3) lack necessary diversity in data formats and programming languages. To address these issues, we introduce REPRO-Bench, a collection of 112 task instances, each representing a social science paper with a publicly available reproduction report. The agents are tasked with assessing the reproducibility of the paper based on the original paper PDF and the corresponding reproduction package. REPRO-Bench features end-to-end evaluation tasks on the reproducibility of social science papers with complexity comparable to real-world assessments. We evaluate three representative AI agents on REPRO-Bench, with the best-performing agent achieving an accuracy of only 21.4%. Building on our empirical analysis, we develop REPRO-Agent, which improves the highest accuracy achieved by existing agents by 71%. We conclude that more advanced AI agents should be developed to automate real-world reproducibility assessment. REPRO-Bench is publicly available at https://github.com/uiuc-kang-lab/REPRO-Bench.


Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning via Incorporating Generalized Human Expertise

arXiv.org Artificial Intelligence

-- Efficient exploration in multi-agent reinforcement learning (MARL) is a challenging problem when receiving only a team reward, especially in environments with sparse rewards. A powerful method to mitigate this issue involves crafting dense individual rewards to guide the agents toward efficient exploration. However, individual rewards generally rely on manually engineered shaping-reward functions that lack high-order intelligence, thus it behaves ineffectively than humans regarding learning and generalization in complex problems. T o tackle these issues, we combine the above two paradigms and propose a novel framework, LIGHT (Learning Individual Intrinsic reward via Incorporating Generalized Human experTise), which can integrate human knowledge into MARL algorithms in an end-to-end manner . LIGHT guides each agent to avoid unnecessary exploration by considering both individual action distribution and human expertise preference distribution. Then, LIGHT designs individual intrinsic rewards for each agent based on actionable representational transformation relevant to Q-learning so that the agents align their action preferences with the human expertise while maximizing the joint action value. Experimental results demonstrate the superiority of our method over representative baselines regarding performance and better knowledge reusability across different sparse-reward tasks on challenging scenarios. Cooperative multi-agent reinforcement learning (MARL) is an important branch in the field of artificial intelligence (AI), playing a crucial role in sequential challenging decision-making problems, such as in autonomous driving [1], sensor networks [2], [3] and robotics control [4]. Centralized training with decentralized execution (CTDE) paradigm has gained substantial attention in cooperative MARL that aims to facilitate agent cooperation by providing global state information during training and executing only based on local observations during execution [5], [6], [7].