Agents
LOKA Protocol: A Decentralized Framework for Trustworthy and Ethical AI Agent Ecosystems
Ranjan, Rajesh, Gupta, Shailja, Singh, Surya Narayan
The rise of autonomous AI agents, capable of perceiving, reasoning, and acting independently, signals a profound shift in how digital ecosystems operate, govern, and evolve. As these agents proliferate beyond centralized infrastructures, they expose foundational gaps in identity, accountability, and ethical alignment. Three critical questions emerge: Identity: Who or what is the agent? Accountability: Can its actions be verified, audited, and trusted? Ethical Consensus: Can autonomous systems reliably align with human values and prevent harmful emergent behaviors? We present the novel LOKA Protocol (Layered Orchestration for Knowledgeful Agents), a unified, systems-level architecture for building ethically governed, interoperable AI agent ecosystems. LOKA introduces a proposed Universal Agent Identity Layer (UAIL) for decentralized, verifiable identity; intent-centric communication protocols for semantic coordination across diverse agents; and a Decentralized Ethical Consensus Protocol (DECP) that could enable agents to make context-aware decisions grounded in shared ethical baselines. Anchored in emerging standards such as Decentralized Identifiers (DIDs), Verifiable Credentials (VCs), and post-quantum cryptography, LOKA proposes a scalable, future-resilient blueprint for multi-agent AI governance. By embedding identity, trust, and ethics into the protocol layer itself, LOKA proposes the foundation for a new era of responsible, transparent, and autonomous AI ecosystems operating across digital and physical domains.
Why Ask One When You Can Ask $k$? Two-Stage Learning-to-Defer to the Top-$k$ Experts
Montreuil, Yannis, Carlier, Axel, Ng, Lai Xing, Ooi, Wei Tsang
Learning-to-Defer (L2D) enables decision-making systems to improve reliability by selectively deferring uncertain predictions to more competent agents. However, most existing approaches focus exclusively on single-agent deferral, which is often inadequate in high-stakes scenarios that require collective expertise. We propose Top-$k$ Learning-to-Defer, a generalization of the classical two-stage L2D framework that allocates each query to the $k$ most confident agents instead of a single one. To further enhance flexibility and cost-efficiency, we introduce Top-$k(x)$ Learning-to-Defer, an adaptive extension that learns the optimal number of agents to consult for each query, based on input complexity, agent competency distributions, and consultation costs. For both settings, we derive a novel surrogate loss and prove that it is Bayes-consistent and $(\mathcal{R}, \mathcal{G})$-consistent, ensuring convergence to the Bayes-optimal allocation. Notably, we show that the well-established model cascades paradigm arises as a restricted instance of our Top-$k$ and Top-$k(x)$ formulations. Extensive experiments across diverse benchmarks demonstrate the effectiveness of our framework on both classification and regression tasks.
Interpretable Locomotion Prediction in Construction Using a Memory-Driven LLM Agent With Chain-of-Thought Reasoning
Construction workers face significant risks of work-related musculoskeletal disorders (WMSDs), driven by repetitive tasks, heavy load handling, and non-neutral postures in dynamic, unpredictable environments [1, 10]. In the U.S., construction workers experience an 11% higher WMSD rate than the average across industries, with the back and shoulders most affected [10]. While exoskeletons show promise in reducing physical strain--passive designs lowering back muscle activity by 10-40% and active ones achieving up to 80% reductions across multiple regions [5]--their practical deployment remains limited by discomfort and poor alignment with human movements, particularly in construction settings [6]. Central to these limitations is the challenge of accurately recognizing user intent across varied tasks, a gap that restricts effective collaboration [3, 34]. This misalignment heightens safety risks, as powered exoskeletons may generate destructive forces if their controlled output deviates from the user's intent [34]. Addressing this locomotion intent recognition challenge is pivotal to unlocking effective exoskeleton assistance in construction, particularly for diverse, safety-critical tasks like ladder climbing and obstacle navigation. Traditional evaluation of assistive technologies like lower-limb exoskeletons has focused narrowly on routine tasks such as straight walking [27], neglecting these critical locomotion modes and requiring a shift beyond conventional control paradigms that lack flexibility for dynamic contexts. Construction tasks are highly variable, requiring workers to adapt to shifting demands, irregular workflows, and unstructured environments where movement patterns are unpredictable [10]. This variability complicates the implementation of assistive technologies, as rigid control approaches struggle to accommodate rapid task transitions and environmental uncertainty.
FlowReasoner: Reinforcing Query-Level Meta-Agents
Gao, Hongcheng, Liu, Yue, He, Yufei, Dou, Longxu, Du, Chao, Deng, Zhijie, Hooi, Bryan, Lin, Min, Pang, Tianyu
This paper proposes a query-level meta-agent named FlowReasoner to automate the design of query-level multi-agent systems, i.e., one system per user query. Our core idea is to incentivize a reasoning-based meta-agent via external execution feedback. Concretely, by distilling DeepSeek R1, we first endow the basic reasoning ability regarding the generation of multi-agent systems to FlowReasoner. Then, we further enhance it via reinforcement learning (RL) with external execution feedback. A multi-purpose reward is designed to guide the RL training from aspects of performance, complexity, and efficiency. In this manner, FlowReasoner is enabled to generate a personalized multi-agent system for each user query via deliberative reasoning. Experiments on both engineering and competition code benchmarks demonstrate the superiority of FlowReasoner. Remarkably, it surpasses o1-mini by 10.52% accuracy across three benchmarks. The code is available at https://github.com/sail-sg/FlowReasoner.
Behavioral Universe Network (BUN): A Behavioral Information-Based Framework for Complex Systems
Zhou, Wei, Borjigin, Ailiya, He, Cong
Modern digital ecosystems are characterized by complex, dynamic interactions among autonomous entities across diverse domains. Traditional paradigms often treat agents and objects separately, failing to provide a unified theoretical foundation to capture their interactive behaviors. This paper introduces the Behavioral Universe Network (BUN), a theoretical framework grounded in the Agent-Interaction-Behavior (AIB) formalism. BUN treats subjects (active agents), objects (resources), and behaviors (operations) as first-class citizens, all governed by a shared Behavioral Information Base (BIB). We first detail the AIB core principles, defining how subjects, objects, and behaviors are formally described and regulated. We then describe BUN as a framework, showcasing how information-driven triggers, semantic object enrichment, and adaptive rules enable highly coordinated multi-agent systems. We highlight the framework's key advantages: more accurate behavior analysis, strong adaptability to dynamic environments, and cross-domain synergies. Finally, we outline open challenges and future work, positioning BUN as a promising foundation for next-generation digital governance and intelligent applications.
Neural ATTF: A Scalable Solution to Lifelong Multi-Agent Path Planning
Shah, Kushal, Park, Jihyun, Choi, Seung-Kyum
Multi-Agent Pickup and Delivery (MAPD) is a fundamental problem in robotics, particularly in applications such as warehouse automation and logistics. Existing solutions often face challenges in scalability, adaptability, and efficiency, limiting their applicability in dynamic environments with real-time planning requirements. This paper presents Neural ATTF (Adaptive Task Token Framework), a new algorithm that combines a Priority Guided Task Matching (PGTM) Module with Neural STA* (Space-Time A*), a data-driven path planning method. Neural STA* enhances path planning by enabling rapid exploration of the search space through guided learned heuristics and ensures collision avoidance under dynamic constraints. PGTM prioritizes delayed agents and dynamically assigns tasks by prioritizing agents nearest to these tasks, optimizing both continuity and system throughput. Experimental evaluations against state-of-the-art MAPD algorithms, including TPTS, CENTRAL, RMCA, LNS-PBS, and LNS-wPBS, demonstrate the superior scalability, solution quality, and computational efficiency of Neural ATTF. These results highlight the framework's potential for addressing the critical demands of complex, real-world multi-agent systems operating in high-demand, unpredictable settings.
Rhythm of Opinion: A Hawkes-Graph Framework for Dynamic Propagation Analysis
Li, Yulong, Lu, Zhixiang, Tang, Feilong, Lai, Simin, Hu, Ming, Zhang, Yuxuan, Xue, Haochen, Wu, Zhaodong, Razzak, Imran, Li, Qingxia, Su, Jionglong
The rapid development of social media has significantly reshaped the dynamics of public opinion, resulting in complex interactions that traditional models fail to effectively capture. To address this challenge, we propose an innovative approach that integrates multi-dimensional Hawkes processes with Graph Neural Network, modeling opinion propagation dynamics among nodes in a social network while considering the intricate hierarchical relationships between comments. The extended multi-dimensional Hawkes process captures the hierarchical structure, multi-dimensional interactions, and mutual influences across different topics, forming a complex propagation network. Moreover, recognizing the lack of high-quality datasets capable of comprehensively capturing the evolution of public opinion dynamics, we introduce a new dataset, VISTA. It includes 159 trending topics, corresponding to 47,207 posts, 327,015 second-level comments, and 29,578 third-level comments, covering diverse domains such as politics, entertainment, sports, health, and medicine. The dataset is annotated with detailed sentiment labels across 11 categories and clearly defined hierarchical relationships. When combined with our method, it offers strong interpretability by linking sentiment propagation to the comment hierarchy and temporal evolution. Our approach provides a robust baseline for future research.
Exploring Collaborative GenAI Agents in Synchronous Group Settings: Eliciting Team Perceptions and Design Considerations for the Future of Work
Johnson, Janet G., Peralta, Macarena, Kaur, Mansanjam, Huang, Ruijie Sophia, Zhao, Sheng, Guan, Ruijia, Rajaram, Shwetha, Nebeling, Michael
While generative artificial intelligence (GenAI) is finding increased adoption in workplaces, current tools are primarily designed for individual use. Prior work established the potential for these tools to enhance personal creativity and productivity towards shared goals; however, we don't know yet how to best take into account the nuances of group work and team dynamics when deploying GenAI in work settings. In this paper, we investigate the potential of collaborative GenAI agents to augment teamwork in synchronous group settings through an exploratory study that engaged 25 professionals across 6 teams in speculative design workshops and individual follow-up interviews. Our workshops included a mixed reality provotype to simulate embodied collaborative GenAI agents capable of actively participating in group discussions. Our findings suggest that, if designed well, collaborative GenAI agents offer valuable opportunities to enhance team problem-solving by challenging groupthink, bridging communication gaps, and reducing social friction. However, teams' willingness to integrate GenAI agents depended on its perceived fit across a number of individual, team, and organizational factors. We outline the key design tensions around agent representation, social prominence, and engagement and highlight the opportunities spatial and immersive technologies could offer to modulate GenAI influence on team outcomes and strike a balance between augmentation and agency.
An LLM-enabled Multi-Agent Autonomous Mechatronics Design Framework
Wang, Zeyu, Lo, Frank P. -W., Chen, Qian, Zhang, Yongqi, Lin, Chen, Chen, Xu, Yu, Zhenhua, Thompson, Alexander J., Yeatman, Eric M., Lo, Benny P. L.
Powered by transformer architectures [11, 12] and trained on massive dataset, models such as GPT -4 [13], Claude [14], DeepSeek [15], and PaLM [16] exhibit strong performance in chain-of-thought reasoning [17, 18], few-shot learning [19, 20], and even multimodal understanding when extended to vision-language settings [21, 22]. These capabilities have enabled LLMs to perform not only linguistic tasks, but also to engage in procedural synthesis [23], and structured decision-making [24, 25], laying the foundation for their integration into agent-based systems capable of autonomous planning and tool use. The rise of agent-based systems [26] marks a critical milestone in artificial intelligence, enabling entities to autonomously perceive, reason, and act within specific environments [27]. LLM-driven agents [28] further enhance these capabilities through sophisticated linguistic comprehension and generation, proving effective in diverse applications such as text summarization [29], software debugging [30, 31], documentation automation [32, 33], customer support [34, 35], mathematical theorem synthesis [36, 37], virtual environment navigation [38], and structured data querying [39, 40]. In industry, LLM agents have streamlined narrowly defined workflows, including report generation [41-43] and basic data analytics [44, 45], significantly improving operational efficiency. However, current LLM-based agent implementations remain primarily confined to digital or simulated environments, thus limiting their practical application in complex engineering tasks which require the design of physical embodiment, cross-domain integration, and constraint-aware reasoning. Recent attempts to bridge this gap have emerged, demonstrating initial integration of LLM agents with physical experimentation in domains such as autonomous chemical synthesis [46], materials design [47-49], drug discovery [50], and adaptive multi-agent manufacturing systems [51]. Despite these advancements, little attention has been given to LLM-enabled multi-agent frameworks targeting autonomous mechatronics design, a field inherently requiring multidisciplinary expertise across mechanical engineering, electronics, control systems, and software development.
A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents
Huang, Yuting, Ding, Leilei, Tang, Zhipeng, Wang, Tianfu, Lin, Xinrui, Zhang, Wuyang, Ma, Mingxiao, Zhang, Yanyong
Large Language Models (LLMs) exhibit substantial promise in enhancing task-planning capabilities within embodied agents due to their advanced reasoning and comprehension. However, the systemic safety of these agents remains an underexplored frontier. In this study, we present Safe-BeAl, an integrated framework for the measurement (SafePlan-Bench) and alignment (Safe-Align) of LLM-based embodied agents' behaviors. SafePlan-Bench establishes a comprehensive benchmark for evaluating task-planning safety, encompassing 2,027 daily tasks and corresponding environments distributed across 8 distinct hazard categories (e.g., Fire Hazard). Our empirical analysis reveals that even in the absence of adversarial inputs or malicious intent, LLM-based agents can exhibit unsafe behaviors. To mitigate these hazards, we propose Safe-Align, a method designed to integrate physical-world safety knowledge into LLM-based embodied agents while maintaining task-specific performance. Experiments across a variety of settings demonstrate that Safe-BeAl provides comprehensive safety validation, improving safety by 8.55 - 15.22%, compared to embodied agents based on GPT-4, while ensuring successful task completion.