Agents
Exploring the Potential of Metacognitive Support Agents for Human-AI Co-Creation
Gmeiner, Frederic, Luo, Kaitao, Wang, Ye, Holstein, Kenneth, Martelaro, Nikolas
Despite the potential of generative AI (GenAI) design tools to enhance design processes, professionals often struggle to integrate AI into their workflows. Fundamental cognitive challenges include the need to specify all design criteria as distinct parameters upfront (intent formulation) and designers' reduced cognitive involvement in the design process due to cognitive offloading, which can lead to insufficient problem exploration, underspecification, and limited ability to evaluate outcomes. Motivated by these challenges, we envision novel metacognitive support agents that assist designers in working more reflectively with GenAI. To explore this vision, we conducted exploratory prototyping through a Wizard of Oz elicitation study with 20 mechanical designers probing multiple metacognitive support strategies. We found that agent-supported users created more feasible designs than non-supported users, with differing impacts between support strategies. Based on these findings, we discuss opportunities and tradeoffs of metacognitive support agents and considerations for future AI-based design tools.
G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems
Zhang, Guibin, Fu, Muxin, Wan, Guancheng, Yu, Miao, Wang, Kun, Yan, Shuicheng
Large language model (LLM)-powered multi-agent systems (MAS) have demonstrated cognitive and execution capabilities that far exceed those of single LLM agents, yet their capacity for self-evolution remains hampered by underdeveloped memory architectures. Upon close inspection, we are alarmed to discover that prevailing MAS memory mechanisms (1) are overly simplistic, completely disregarding the nuanced inter-agent collaboration trajectories, and (2) lack cross-trial and agent-specific customization, in stark contrast to the expressive memory developed for single agents. To bridge this gap, we introduce G-Memory, a hierarchical, agentic memory system for MAS inspired by organizational memory theory, which manages the lengthy MAS interaction via a three-tier graph hierarchy: insight, query, and interaction graphs. Upon receiving a new user query, G-Memory performs bi-directional memory traversal to retrieve both $\textit{high-level, generalizable insights}$ that enable the system to leverage cross-trial knowledge, and $\textit{fine-grained, condensed interaction trajectories}$ that compactly encode prior collaboration experiences. Upon task execution, the entire hierarchy evolves by assimilating new collaborative trajectories, nurturing the progressive evolution of agent teams. Extensive experiments across five benchmarks, three LLM backbones, and three popular MAS frameworks demonstrate that G-Memory improves success rates in embodied action and accuracy in knowledge QA by up to $20.89\%$ and $10.12\%$, respectively, without any modifications to the original frameworks. Our codes are available at https://github.com/bingreeky/GMemory.
Decentralized Decision Making in Two Sided Manufacturing-as-a-Service Marketplaces
Advancements in digitization have enabled two sided manufacturing-as-a-service (MaaS) marketplaces which has significantly reduced product development time for designers. These platforms provide designers with access to manufacturing resources through a network of suppliers and have instant order placement capabilities. Two key decision making levers are typically used to optimize the operations of these marketplaces: pricing and matching. The existing marketplaces operate in a centralized structure where they have complete control over decision making. However, a decentralized organization of the platform enables transparency of information across clients and suppliers. This dissertation focuses on developing tools for decision making enabling decentralization in MaaS marketplaces. In pricing mechanisms, a data driven method is introduced which enables small service providers to price services based on specific attributes of the services offered. A data mining method recommends a network based price to a supplier based on its attributes and the attributes of other suppliers on the platform. Three different approaches are considered for matching mechanisms. First, a reverse auction mechanism is introduced where designers bid for manufacturing services and the mechanism chooses a supplier which can match the bid requirements and stated price. The second approach uses mechanism design and mathematical programming to develop a stable matching mechanism for matching orders to suppliers based on their preferences. Empirical simulations are used to test the mechanisms in a simulated 3D printing marketplace and to evaluate the impact of stability on its performance. The third approach considers the matching problem in a dynamic and stochastic environment where demand (orders) and supply (supplier capacities) arrive over time and matching is performed online.
Mapping Neural Signals to Agent Performance, A Step Towards Reinforcement Learning from Neural Feedback
Santaniello, Julia, Russell, Matthew, Jiang, Benson, Sassaroli, Donatello, Jacob, Robert, Sinapov, Jivko
Implicit Human-in-the-Loop Reinforcement Learning (HITL-RL) is a methodology that integrates passive human feedback into autonomous agent training while minimizing human workload. However, existing methods often rely on active instruction, requiring participants to teach an agent through unnatural expression or gesture. We introduce NEURO-LOOP, an implicit feedback framework that utilizes the intrinsic human reward system to drive human-agent interaction. This work demonstrates the feasibility of a critical first step in the NEURO-LOOP framework: mapping brain signals to agent performance. Using functional near-infrared spectroscopy (fNIRS), we design a dataset to enable future research using passive Brain-Computer Interfaces for Human-in-the-Loop Reinforcement Learning. Participants are instructed to observe or guide a reinforcement learning agent in its environment while signals from the prefrontal cortex are collected. We conclude that a relationship between fNIRS data and agent performance exists using classical machine learning techniques. Finally, we highlight the potential that neural interfaces may offer to future applications of human-agent interaction, assistive AI, and adaptive autonomous systems.
Topology-Assisted Spatio-Temporal Pattern Disentangling for Scalable MARL in Large-scale Autonomous Traffic Control
Li, Rongpeng, Zhu, Jianhang, Huang, Jiahao, Zhao, Zhifeng, Zhang, Honggang
Intelligent Transportation Systems (ITSs) have emerged as a promising solution towards ameliorating urban traffic congestion, with Traffic Signal Control (TSC) identified as a critical component. Although Multi-Agent Reinforcement Learning (MARL) algorithms have shown potential in optimizing TSC through real-time decision-making, their scalability and effectiveness often suffer from large-scale and complex environments. Typically, these limitations primarily stem from a fundamental mismatch between the exponential growth of the state space driven by the environmental heterogeneities and the limited modeling capacity of current solutions. To address these issues, this paper introduces a novel MARL framework that integrates Dynamic Graph Neural Networks (DGNNs) and Topological Data Analysis (TDA), aiming to enhance the expressiveness of environmental representations and improve agent coordination. Furthermore, inspired by the Mixture of Experts (MoE) architecture in Large Language Models (LLMs), a topology-assisted spatial pattern disentangling (TSD)-enhanced MoE is proposed, which leverages topological signatures to decouple graph features for specialized processing, thus improving the model's ability to characterize dynamic and heterogeneous local observations. The TSD module is also integrated into the policy and value networks of the Multi-agent Proximal Policy Optimization (MAPPO) algorithm, further improving decision-making efficiency and robustness. Extensive experiments conducted on real-world traffic scenarios, together with comprehensive theoretical analysis, validate the superior performance of the proposed framework, highlighting the model's scalability and effectiveness in addressing the complexities of large-scale TSC tasks.
SheetMind: An End-to-End LLM-Powered Multi-Agent Framework for Spreadsheet Automation
Zhu, Ruiyan, Cheng, Xi, Liu, Ke, Zhu, Brian, Jin, Daniel, Parihar, Neeraj, Xu, Zhoutian, Gao, Oliver
We present SheetMind, a modular multi-agent framework powered by large language models (LLMs) for spreadsheet automation via natural language instructions. The system comprises three specialized agents: a Manager Agent that decomposes complex user instructions into subtasks; an Action Agent that translates these into structured commands using a Backus Naur Form (BNF) grammar; and a Reflection Agent that validates alignment between generated actions and the user's original intent. Integrated into Google Sheets via a Workspace extension, SheetMind supports real-time interaction without requiring scripting or formula knowledge. Experiments on benchmark datasets demonstrate an 80 percent success rate on single step tasks and approximately 70 percent on multi step instructions, outperforming ablated and baseline variants. Our results highlight the effectiveness of multi agent decomposition and grammar based execution for bridging natural language and spreadsheet functionalities.
IndoorWorld: Integrating Physical Task Solving and Social Simulation in A Heterogeneous Multi-Agent Environment
Wu, Dekun, Brudy, Frederik, Liu, Bang, Wang, Yi
Virtual environments are essential to AI agent research. Existing environments for LLM agent research typically focus on either physical task solving or social simulation, with the former oversimplifying agent individuality and social dynamics, and the latter lacking physical grounding of social behaviors. We introduce IndoorWorld, a heterogeneous multi-agent environment that tightly integrates physical and social dynamics. By introducing novel challenges for LLM-driven agents in orchestrating social dynamics to influence physical environments and anchoring social interactions within world states, IndoorWorld opens up possibilities of LLM-based building occupant simulation for architectural design. We demonstrate the potential with a series of experiments within an office setting to examine the impact of multi-agent collaboration, resource competition, and spatial layout on agent behavior.
Perspective on Utilizing Foundation Models for Laboratory Automation in Materials Research
Hatakeyama-Sato, Kan, Nishida, Toshihiko, Kitamura, Kenta, Ushiku, Yoshitaka, Takahashi, Koichi, Nabae, Yuta, Hayakawa, Teruaki
Tokyo 152 - 8552, Japan E - mail: kan.hatakeyama [ [ at ] ] weblab.t.u - tokyo.ac.jp Abstract This review explores the potential of foundation models to advanc e laboratory automation in the materials and chemical sciences. It emphasizes the dual roles of these models: cognitive functions for experimental planning and data analysis, and physical functions for hardware operations. While traditional laboratory automation has relied heavily on specialized, rigid systems, foundation models offer adaptability through their general - purpose intelligence and multimodal capabilities. Recent advancements have demonstrated the fea sibility of using large language models (LLMs) and multimodal robotic systems to handle complex and dynamic laboratory tasks. However, significant challenges remain, including precision manipulation of hardware, integration of multimodal data, and ensuring operational safety. Th is paper outlines a roadmap highlighting future directions, advocating for close interdisciplinary collaboration, benchmark establishment, and strategic human - AI integration to realize fully autonomous experimental laboratories. Keywords Laboratory Automation; Foundation Models; Robotics; Artificial Intelligence; Materials Science 1. Expectations for Foundation Models in Materials Laboratory Automation Laboratory automation, a technology aimed at automating experimental research, is expected to pave the way for a new research paradigm in materials science [1, 2, 3] . By rapidly and comprehensively executing numerous experiments, laboratory automation accelerates research, enhances reproducibility through precisely controlled robotic processes, and enables swift and distributed knowledge sharing among researchers worldwide [1] . This technology is anticipated to contribute significantly to the development of crucial devices and compounds, including catalyst s for energy and chemical conversions, environmentally friendly plastics, solar cells, secondary batteries, fuel cells, thermoelectric conversion modules, nuclear fusion reactors, quantum computers, and energy - efficient computing systems [1, 4, 5] . The success of next - generation laboratory automation depends not only o n experimental hardware but also o n the utilization of artificial intelligence (AI), especially foundation models. Foundation models represent a new AI paradigm encompassing large language models like GPT - 4 [6], multimodal models, and agent - related technologies. These foundation models and generative AI have begun to influenc e chemistry and materials science [7], giving rise to diverse applications including molecular and materials design [8, 9, 10], reaction pathway exploration [11], catalyst design [12], and even autonomous planning of chemical experiments [13] . Additionally, foundation models are being expanded to hardware control mechanisms, enabling natural language - driven robotic operations [14, 15] .
Deep Fictitious Play-Based Potential Differential Games for Learning Human-Like Interaction at Unsignalized Intersections
Chen, Kehua, Zhang, Shucheng, Wang, Yinhai
Modeling vehicle interactions at unsignalized intersections is a challenging task due to the complexity of the underlying game-theoretic processes. Although prior studies have attempted to capture interactive driving behaviors, most approaches relied solely on game-theoretic formulations and did not leverage naturalistic driving datasets. In this study, we learn human-like interactive driving policies at unsignalized intersections using Deep Fictitious Play. Specifically, we first model vehicle interactions as a Differential Game, which is then reformulated as a Potential Differential Game. The weights in the cost function are learned from the dataset and capture diverse driving styles. We also demonstrate that our framework provides a theoretical guarantee of convergence to a Nash equilibrium. To the best of our knowledge, this is the first study to train interactive driving policies using Deep Fictitious Play. We validate the effectiveness of our Deep Fictitious Play-Based Potential Differential Game (DFP-PDG) framework using the INTERACTION dataset. The results demonstrate that the proposed framework achieves satisfactory performance in learning human-like driving policies. The learned individual weights effectively capture variations in driver aggressiveness and preferences. Furthermore, the ablation study highlights the importance of each component within our model.
Cloud Infrastructure Management in the Age of AI Agents
Yang, Zhenning, Bhatnagar, Archit, Qiu, Yiming, Miao, Tongyuan, Kon, Patrick Tser Jern, Xiao, Yunming, Huang, Yibo, Casado, Martin, Chen, Ang
Cloud infrastructure is the cornerstone of the modern IT industry. However, managing this infrastructure effectively requires considerable manual effort from the DevOps engineering team. We make a case for developing AI agents powered by large language models (LLMs) to automate cloud infrastructure management tasks. In a preliminary study, we investigate the potential for AI agents to use different cloud/user interfaces such as software development kits (SDK), command line interfaces (CLI), Infrastructure-as-Code (IaC) platforms, and web portals. We report takeaways on their effectiveness on different management tasks, and identify research challenges and potential solutions.