Goto

Collaborating Authors

 Agents


Reviews: Multi-Agent Common Knowledge Reinforcement Learning

Neural Information Processing Systems

All reviewers agreed this paper is well written presenting some interesting novel ideas. Reviewers believe that integrating common knowledge directly into Multi-agent RL training is a nice idea, and suggests some interesting future directions of research. Initially, there were shared concerns and confusion though about a number of issues, most prominently about the matrix game example. After reading and discussing the authors rebuttal though it seems the authors adequately addressed some of the primary concerns, and the general sense is that this paper is solid and of interest to be presented at NeurIPS.


Reviews: MAVEN: Multi-Agent Variational Exploration

Neural Information Processing Systems

The Starcraft results also seem fine, but not so strong as it make it obvious that committed exploration is a crucial empirical improvement for QMIX - while MAVEN agents learn faster in 3s5z, the final performance looks the same; MAVEN agents seem to have less variability in final win rate on 5m_vs_6m; and QMIX actually seems to have better final performance on 10m_vs_11m. The results in figure 2 and 4 do however suggest that there may be scenarios where the advantage of MAVEN is higher. Minor comments: 1) line 64 and others: the subscript "qmix" should probably be wrapped in a "\text{}" 2) first eqn in section 3: inconsistency between using subscripts and superscripts, i.e. u_i and u i 3) line 81: perhaps better phrased as: "the *best* action of agent i..." 4) line 86: u_n i - u_ U i? 5) line 87: I was confused by what "the set of all possible such orderings over the action-values" means. Besides a degeneracy when some of the Q values are identical, isn't there only one valid ordering? Or are you just trying to cover that degeneracy? 6) Definition 1: perhaps add an intuitive explanation, e.g. "Intuitively, a Q-function is non-monotonic if the ordering of best actions for agent i can be affected by the other agents action choices at that time step."


Reviews: MAVEN: Multi-Agent Variational Exploration

Neural Information Processing Systems

The paper presents a new exploration strategy for decentralized MARL that is based on a joint latent variable that is shared between the agent. This paper is a difficult case. While the theoretical insights concerning the difficulty of the exploration problem in decentralized MARL are insightful, the experimental results were not good enough in the original submission to convince the reviewers. The algorithm was only in one case considerably better than the competitor QMix and other baseline comparison were missing. However, in the rebuttal the authors provided much better results as well as additional comparison to Qtrans.


On the Parallels Between Evolutionary Theory and the State of AI

arXiv.org Artificial Intelligence

This article critically examines the foundational principles of contemporary AI methods, exploring the limitations that hinder its potential. We draw parallels between the modern AI landscape and the 20th-century Modern Synthesis in evolutionary biology, and highlight how advancements in evolutionary theory that augmented the Modern Synthesis, particularly those of Evolutionary Developmental Biology, offer insights that can inform a new design paradigm for AI. By synthesizing findings across AI and evolutionary theory, we propose a pathway to overcome existing limitations, enabling AI to achieve its aspirational goals.


AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage

arXiv.org Artificial Intelligence

Efficient experiment reproduction is critical to accelerating progress in artificial intelligence. However, the inherent complexity of method design and training procedures presents substantial challenges for automation. Notably, reproducing experiments often requires implicit domain-specific knowledge not explicitly documented in the original papers. To address this, we introduce the paper lineage algorithm, which identifies and extracts implicit knowledge from the relevant references cited by the target paper. Building on this idea, we propose AutoReproduce, a multi-agent framework capable of automatically reproducing experiments described in research papers in an end-to-end manner. AutoReproduce enhances code executability by generating unit tests alongside the reproduction process. To evaluate the reproduction capability, we construct ReproduceBench, a benchmark annotated with verified implementations, and introduce novel evaluation metrics to assess both the reproduction and execution fidelity. Experimental results demonstrate that AutoReproduce outperforms the existing strong agent baselines on all five evaluation metrics by a peak margin of over $70\%$. In particular, compared to the official implementations, AutoReproduce achieves an average performance gap of $22.1\%$ on $89.74\%$ of the executable experiment runs. The code will be available at https://github.com/AI9Stars/AutoReproduce.


Towards Natural Language Communication for Cooperative Autonomous Driving via Self-Play

arXiv.org Artificial Intelligence

Past work has demonstrated that autonomous vehicles can drive more safely if they communicate with one another than if they do not. However, their communication has often not been human-understandable. Using natural language as a vehicle-to-vehicle (V2V) communication protocol offers the potential for autonomous vehicles to drive cooperatively not only with each other but also with human drivers. In this work, we propose a suite of traffic tasks in autonomous driving where vehicles in a traffic scenario need to communicate in natural language to facilitate coordination in order to avoid an imminent collision and/or support efficient traffic flow. To this end, this paper introduces a novel method, LLM+Debrief, to learn a message generation and high-level decision-making policy for autonomous vehicles through multi-agent discussion. To evaluate LLM agents for driving, we developed a gym-like simulation environment that contains a range of driving scenarios. Our experimental results demonstrate that LLM+Debrief is more effective at generating meaningful and human-understandable natural language messages to facilitate cooperation and coordination than a zero-shot LLM agent. Our code and demo videos are available at https://talking-vehicles.github.io/.


Distributed Intelligence in the Computing Continuum with Active Inference

arXiv.org Artificial Intelligence

The Computing Continuum (CC) is an emerging Internet-based computing paradigm that spans from local Internet of Things sensors and constrained edge devices to large-scale cloud data centers. Its goal is to orchestrate a vast array of diverse and distributed computing resources to support the next generation of Internet-based applications. However, the distributed, heterogeneous, and dynamic nature of CC platforms demands distributed intelligence for adaptive and resilient service management. This article introduces a distributed stream processing pipeline as a CC use case, where each service is managed by an Active Inference (AIF) agent. These agents collaborate to fulfill service needs specified by SLOiDs, a term we introduce to denote Service Level Objectives that are aware of its deployed devices, meaning that non-functional requirements must consider the characteristics of the hosting device. We demonstrate how AIF agents can be modeled and deployed alongside distributed services to manage them autonomously. Our experiments show that AIF agents achieve over 90% SLOiD fulfillment when using tested transition models, and around 80% when learning the models during deployment. We compare their performance to a multi-agent reinforcement learning algorithm, finding that while both approaches yield similar results, MARL requires extensive training, whereas AIF agents can operate effectively from the start. Additionally, we evaluate the behavior of AIF agents in offloading scenarios, observing a strong capacity for adaptation. Finally, we outline key research directions to advance AIF integration in CC platforms.


Unifying Language Agent Algorithms with Graph-based Orchestration Engine for Reproducible Agent Research

arXiv.org Artificial Intelligence

Language agents powered by large language models (LLMs) have demonstrated remarkable capabilities in understanding, reasoning, and executing complex tasks. However, developing robust agents presents significant challenges: substantial engineering overhead, lack of standardized components, and insufficient evaluation frameworks for fair comparison. We introduce Agent Graph-based Orchestration for Reasoning and Assessment (AGORA), a flexible and extensible framework that addresses these challenges through three key contributions: (1) a modular architecture with a graph-based workflow engine, efficient memory management, and clean component abstraction; (2) a comprehensive suite of reusable agent algorithms implementing state-of-the-art reasoning approaches; and (3) a rigorous evaluation framework enabling systematic comparison across multiple dimensions. Through extensive experiments on mathematical reasoning and multimodal tasks, we evaluate various agent algorithms across different LLMs, revealing important insights about their relative strengths and applicability. Our results demonstrate that while sophisticated reasoning approaches can enhance agent capabilities, simpler methods like Chain-of-Thought often exhibit robust performance with significantly lower computational overhead. AGORA not only simplifies language agent development but also establishes a foundation for reproducible agent research through standardized evaluation protocols.


Large Language Model-Based Agents for Automated Research Reproducibility: An Exploratory Study in Alzheimer's Disease

arXiv.org Artificial Intelligence

Objective: To demonstrate the capabilities of Large Language Models (LLMs) as autonomous agents to reproduce findings of published research studies using the same or similar dataset. Materials and Methods: We used the "Quick Access" dataset of the National Alzheimer's Coordinating Center (NACC). We identified highly cited published research manuscripts using NACC data and selected five studies that appeared reproducible using this dataset alone. Using GPT-4o, we created a simulated research team of LLM-based autonomous agents tasked with writing and executing code to dynamically reproduce the findings of each study, given only study Abstracts, Methods sections, and data dictionary descriptions of the dataset. Results: We extracted 35 key findings described in the Abstracts across 5 Alzheimer's studies. On average, LLM agents approximately reproduced 53.2% of findings per study. Numeric values and range-based findings often differed between studies and agents. The agents also applied statistical methods or parameters that varied from the originals, though overall trends and significance were sometimes similar. Discussion: In some cases, LLM-based agents replicated research techniques and findings. In others, they failed due to implementation flaws or missing methodological detail. These discrepancies show the current limits of LLMs in fully automating reproducibility assessments. Still, this early investigation highlights the potential of structured agent-based systems to provide scalable evaluation of scientific rigor. Conclusion: This exploratory work illustrates both the promise and limitations of LLMs as autonomous agents for automating reproducibility in biomedical research.


Scalable, Symbiotic, AI and Non-AI Agent Based Parallel Discrete Event Simulations

arXiv.org Artificial Intelligence

To fully leverage the potential of artificial intelligence (AI) systems in a trustworthy manner, it is desirable to couple multiple AI and non-AI systems together seamlessly for constraining and ensuring correctness of the output. This paper introduces a novel parallel discrete event simulation (PDES) based methodology to combine multiple AI and non-AI agents in a causal, rule-based way. Our approach tightly integrates the concept of passage of time, with each agent considered as an entity in the PDES framework and responding to prior requests from other agents. Such coupling mechanism enables the agents to work in a co-operative environment towards a common goal while many tasks run in parallel throughout the simulation. It further enables setting up boundaries to the outputs of the AI agents by applying necessary dynamic constraints using non-AI agents while allowing for scalability through deployment of hundreds of such agents in a larger compute cluster. Distributing smaller AI agents can enable extremely scalable simulations in the future, addressing local memory bottlenecks for model parameter storage. Within a PDES involving both AI and non-AI agents, we break down the problem at hand into structured steps, when necessary, providing a set of multiple choices to the AI agents, and then progressively solve these steps towards a final goal. At each step, the non-AI agents act as unbiased auditors, verifying each action by the AI agents so that certain rules of engagement are followed. We evaluate our approach by solving four problems from four different domains and comparing the results with those from AI models alone. Our results show greater accuracy in solving problems from various domains where the AI models struggle to solve the problems solely by themselves. Results show that overall accuracy of our approach is 68% where as the accuracy of vanilla models is less than 23%.