Agents
Difficulty-Aware Agentic Orchestration for Query-Specific Multi-Agent Workflows
Su, Jinwei, Lan, Qizhen, Xia, Yinghui, Sun, Lifan, Tian, Weiyou, Shi, Tianyu, Song, Xinyuan, He, Lewei
Large Language Model (LLM)-based agentic systems have shown strong capabilities across various tasks. However, existing multi-agent frameworks often rely on static or task-level workflows, which either over-process simple queries or underperform on complex ones, while also neglecting the efficiency-performance trade-offs across heterogeneous LLMs. To address these limitations, we propose Difficulty-Aware Agentic Orchestration (DAAO), which can dynamically generate query-specific multi-agent workflows guided by predicted query difficulty. DAAO comprises three interdependent modules: a variational autoencoder (VAE) for difficulty estimation, a modular operator allocator, and a cost- and performance-aware LLM router. A self-adjusting policy updates difficulty estimates based on workflow success, enabling simpler workflows for easy queries and more complex strategies for harder ones. Experiments on six benchmarks demonstrate that DAAO surpasses prior multi-agent systems in both accuracy and inference efficiency, validating its effectiveness for adaptive, difficulty-aware reasoning.
Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First
Liu, Shu, Ponnapalli, Soujanya, Shankar, Shreya, Zeighami, Sepanta, Zhu, Alan, Agarwal, Shubham, Chen, Ruiqi, Suwito, Samion, Yuan, Shuo, Stoica, Ion, Zaharia, Matei, Cheung, Alvin, Crooks, Natacha, Gonzalez, Joseph E., Parameswaran, Aditya G.
Large Language Model (LLM) agents, acting on their users' behalf to manipulate and analyze data, are likely to become the dominant workload for data systems in the future. When working with data, agents employ a high-throughput process of exploration and solution formulation for the given task, one we call agentic speculation. The sheer volume and inefficiencies of agentic speculation can pose challenges for present-day data systems. We argue that data systems need to adapt to more natively support agentic workloads. We take advantage of the characteristics of agentic speculation that we identify, i.e., scale, heterogeneity, redundancy, and steerability - to outline a number of new research opportunities for a new agent-first data systems architecture, ranging from new query interfaces, to new query processing techniques, to new agentic memory stores.
MOTIF: Multi-strategy Optimization via Turn-based Interactive Framework
Kiet, Nguyen Viet Tuan, Van Tung, Dao, Dao, Tran Cong, Binh, Huynh Thi Thanh
Designing effective algorithmic components remains a fundamental obstacle in tackling NP-hard combinatorial optimization problems (COPs), where solvers often rely on carefully hand-crafted strategies. Despite recent advances in using large language models (LLMs) to synthesize high-quality components, most approaches restrict the search to a single element - commonly a heuristic scoring function - thus missing broader opportunities for innovation. In this paper, we introduce a broader formulation of solver design as a multi-strategy optimization problem, which seeks to jointly improve a set of interdependent components under a unified objective. To address this, we propose Multi-strategy Optimization via Turn-based Interactive Framework (MOTIF) - a novel framework based on Monte Carlo Tree Search that facilitates turn-based optimization between two LLM agents. At each turn, an agent improves one component by leveraging the history of both its own and its opponent's prior updates, promoting both competitive pressure and emergent cooperation. This structured interaction broadens the search landscape and encourages the discovery of diverse, high-performing solutions. Experiments across multiple COP domains show that MOTIF consistently outperforms state-of-the-art methods, highlighting the promise of turn-based, multi-agent prompting for fully automated solver design.
Generative Large-Scale Pre-trained Models for Automated Ad Bidding Optimization
Lei, Yu, Zhao, Jiayang, Zhao, Yilei, Zhang, Zhaoqi, Cai, Linyou, Xie, Qianlong, Wang, Xingxing
Modern auto-bidding systems are required to balance overall performance with diverse advertiser goals and real-world constraints, reflecting the dynamic and evolving needs of the industry. Recent advances in conditional generative models, such as transformers and diffusers, have enabled direct trajectory generation tailored to advertiser preferences, offering a promising alternative to traditional Markov Decision Process-based methods. However, these generative methods face significant challenges, such as the distribution shift between offline and online environments, limited exploration of the action space, and the necessity to meet constraints like marginal Cost-per-Mille (CPM) and Return on Investment (ROI). To tackle these challenges, we propose GRAD (Generative Reward-driven Ad-bidding with Mixture-of-Experts), a scalable foundation model for auto-bidding that combines an Action-Mixture-of-Experts module for diverse bidding action exploration with the Value Estimator of Causal Transformer for constraint-aware optimization. Extensive offline and online experiments demonstrate that GRAD significantly enhances platform revenue, highlighting its effectiveness in addressing the evolving and diverse requirements of modern advertisers. Furthermore, GRAD has been implemented in multiple marketing scenarios at Meituan, one of the world's largest online food delivery platforms, leading to a 2.18% increase in Gross Merchandise Value (GMV) and 10.68% increase in ROI.
Semantic Chain-of-Trust: Autonomous Trust Orchestration for Collaborator Selection via Hypergraph-Aided Agentic AI
Zhu, Botao, Wang, Xianbin, Niyato, Dusit
The effective completion of tasks in collaborative systems hinges on task-specific trust evaluations of potential devices for distributed collaboration. Due to independent operation of devices involved, dynamic evolution of their mutual relationships, and complex situation-related impact on trust evaluation, effectively assessing devices' trust for collaborator selection is challenging. To overcome this challenge, we propose a semantic chain-of-trust model implemented with agentic AI and hypergraphs for supporting effective collaborator selection. We first introduce a concept of semantic trust, specifically designed to assess collaborators along multiple semantic dimensions for a more accurate representation of their trustworthiness. To facilitate intelligent evaluation, an agentic AI system is deployed on each device, empowering it to autonomously perform necessary operations, including device state detection, trust-related data collection, semantic extraction, task-specific resource evaluation, to derive a semantic trust representation for each collaborator. In addition, each device leverages a hypergraph to dynamically manage potential collaborators according to different levels of semantic trust, enabling fast one-hop collaborator selection. Furthermore, adjacent trusted devices autonomously form a chain through the hypergraph structure, supporting multi-hop collaborator selection. Experimental results demonstrate that the proposed semantic chain-of-trust achieves 100\% accuracy in trust evaluation based on historical collaborations, enabling intelligent, resource-efficient, and precise collaborator selection.
Autocratic strategies in Cournot oligopoly game
Ueda, Masahiko, Yagi, Shoma, Ichinose, Genki
An oligopoly is a market in which the price of goods is controlled by a few firms. Cournot introduced the simplest game-theoretic model of oligopoly, where profit-maximizing behavior of each firm results in market failure. Furthermore, when the Cournot oligopoly game is infinitely repeated, firms can tacitly collude to monopolize the market. Such tacit collusion is realized by the same mechanism as direct reciprocity in the repeated prisoner's dilemma game, where mutual cooperation can be realized whereas defection is favorable for both prisoners in a one-shot game. Recently, in the repeated prisoner's dilemma game, a class of strategies called zero-determinant strategies attracts much attention in the context of direct reciprocity. Zero-determinant strategies are autocratic strategies which unilaterally control payoffs of players by enforcing linear relationships between payoffs. There were many attempts to find zero-determinant strategies in other games and to extend them so as to apply them to broader situations. In this paper, first, we show that zero-determinant strategies exist even in the repeated Cournot oligopoly game, and that they are quite different from those in the repeated prisoner's dilemma game. Especially, we prove that a fair zero-determinant strategy exists, which is guaranteed to obtain the average payoff of the opponents. Second, we numerically show that the fair zero-determinant strategy can be used to promote collusion when it is used against an adaptively learning player, whereas it cannot promote collusion when it is used against two adaptively learning players. Our findings elucidate some negative impact of zero-determinant strategies in the oligopoly market.
BEDI: A Comprehensive Benchmark for Evaluating Embodied Agents on UAVs
Guo, Mingning, Wu, Mengwei, He, Jiarun, Li, Shaoxian, Li, Haifeng, Tao, Chao
With the rapid advancement of low-altitude remote sensing and Vision-Language Models (VLMs), Embodied Agents based on Unmanned Aerial Vehicles (UAVs) have shown significant potential in autonomous tasks. However, current evaluation methods for UAV-Embodied Agents (UAV-EAs) remain constrained by the lack of standardized benchmarks, diverse testing scenarios and open system interfaces. To address these challenges, we propose BEDI (Benchmark for Embodied Drone Intelligence), a systematic and standardized benchmark designed for evaluating UAV-EAs. Specifically, we introduce a novel Dynamic Chain-of-Embodied-Task paradigm based on the perception-decision-action loop, which decomposes complex UAV tasks into standardized, measurable subtasks. Building on this paradigm, we design a unified evaluation framework encompassing six core sub-skills: semantic perception, spatial perception, motion control, tool utilization, task planning and action generation. Furthermore, we develop a hybrid testing platform that incorporates a wide range of both virtual and real-world scenarios, enabling a comprehensive evaluation of UAV-EAs across diverse contexts. The platform also offers open and standardized interfaces, allowing researchers to customize tasks and extend scenarios, thereby enhancing flexibility and scalability in the evaluation process. Finally, through empirical evaluations of several state-of-the-art (SOTA) VLMs, we reveal their limitations in embodied UAV tasks, underscoring the critical role of the BEDI benchmark in advancing embodied intelligence research and model optimization. By filling the gap in systematic and standardized evaluation within this field, BEDI facilitates objective model comparison and lays a robust foundation for future development in this field. Our benchmark is now publicly available at https://github.com/lostwolves/BEDI.
LOG-Nav: Efficient Layout-Aware Object-Goal Navigation with Hierarchical Planning
Hou, Jiawei, Xiao, Yuting, Xue, Xiangyang, Zeng, Taiping
We introduce LOG-Nav, an efficient layout-aware object-goal navigation approach designed for complex multi-room indoor environments. By planning hierarchically leveraging a global topologigal map with layout information and local imperative approach with detailed scene representation memory, LOG-Nav achieves both efficient and effective navigation. The process is managed by an LLM-powered agent, ensuring seamless effective planning and navigation, without the need for human interaction, complex rewards, or costly training. Our experimental results on the MP3D benchmark achieves 85\% object navigation success rate (SR) and 79\% success rate weighted by path length (SPL) (over 40\% point improvement in SR and 60\% improvement in SPL compared to exsisting methods). Furthermore, we validate the robustness of our approach through virtual agent and real-world robotic deployment, showcasing its capability in practical scenarios.
A Physics-Informed Fixed Skyroad Model for Continuous UAS Traffic Management (C-UTM)
Zahed, Muhammad Junayed Hasan, Rastgoftar, Hossein
Abstract--Unlike traditional multi-agent coordination frameworks, which assume a fixed number of agents, UAS traffic management (UTM) requires a platform that enables Uncrewed Aerial Systems (UAS) to freely enter or exit constrained low-altitude airspace. Consequently, the number of UAS operating in a given region is time-varying, with vehicles dynamically joining or leaving even in dense, obstacle-laden environments. The primary goal of this paper is to develop a computationally efficient management system that maximizes airspace usability while ensuring safety and efficiency. T o achieve this, we first introduce physics-informed methods to structure fixed skyroads across multiple altitude layers of urban airspace, with the directionality of each skyroad designed to guarantee full reachability. We then present a novel Continuous UTM (C-UTM) framework that optimally allocates skyroads to UAS requests while accounting for the time-varying capacity of the airspace. Collectively, the proposed model addresses the key challenges of low-altitude UTM by providing a scalable, safe, and efficient solution for urban airspace usability.
Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models
Yang, Chen, Peng, Guangyue, Zhu, Jiaying, Le, Ran, Feng, Ruixiang, Zhang, Tao, Ruan, Wei, Liu, Xiaoqi, Cheng, Xiaoxue, Xu, Xiyun, Song, Yang, Gao, Yanzipeng, Jia, Yiming, Xing, Yun, Wen, Yuntao, Wang, Zekai, An, Zhenwei, Sun, Zhicong, Chen, Zongchao
We present Nanbeige4-3B, a family of small-scale but high-performing language models. Pretrained on 23T high-quality tokens and finetuned on over 30 million diverse instructions, we extend the boundary of the scaling law for small language models. In pre-training, we design a Fine-Grained Warmup-Stable-Decay (FG-WSD) training scheduler, which progressively refines data mixtures across stages to boost model performance. In post-training, to improve the quality of the SFT data, we design a joint mechanism that integrates deliberative generation refinement and chain-of-thought reconstruction, yielding substantial gains on complex tasks. Following SFT, we employ our flagship reasoning model to distill Nanbeige4-3B through our proposed Dual Preference Distillation (DPD) method, which leads to further performance gains. Finally, a multi-stage reinforcement learning phase was applied, leveraging verifiable rewards and preference modeling to strengthen abilities on both reasoning and human alignment. Extensive evaluations show that Nanbeige4-3B not only significantly outperforms models of comparable parameter scale but also rivals much larger models across a wide range of benchmarks. The model checkpoints are available at https://huggingface.co/Nanbeige.