Agents
One Person, One Bot
This short paper puts forward a vision for a new democratic model enabled by the recent technological advances in agentic AI. It therefore opens with drawing a clear and concise picture of the model, and only later addresses related proposals and research directions, and concerns regarding feasibility and safety. It ends with a note on the timeliness of this idea and on optimism. The model proposed is that of assigning each citizen an AI Agent that would serve as their political delegate, enabling the return to direct democracy. The paper examines this models relation to existing research, its potential setbacks and feasibility and argues for its further development.
MaintainCoder: Maintainable Code Generation Under Dynamic Requirements
Wang, Zhengren, Ling, Rui, Wang, Chufan, Yu, Yongan, Li, Zhiyu, Xiong, Feiyu, Zhang, Wentao
Modern code generation has made significant strides in functional correctness and execution efficiency. However, these systems often overlook a critical dimension in real-world software development: maintainability. To handle dynamic requirements with minimal rework, we propose MaintainCoder as a pioneering solution. It integrates Waterfall model, design patterns, and multi-agent collaboration to systematically enhance cohesion, reduce coupling, and improve adaptability. We also introduce MaintainBench, a benchmark comprising requirement changes and corresponding dynamic metrics on maintainance effort. Experiments demonstrate that existing code generation methods struggle to meet maintainability standards when requirements evolve. In contrast, MaintainCoder improves maintainability metrics by 14-30% with even higher correctness, i.e. pass@k. Our work not only provides the foundation of maintainable code generation, but also highlights the need for more holistic code quality research. Resources: https://github.com/IAAR-Shanghai/MaintainCoder.
Ride-Sourcing Vehicle Rebalancing with Service Accessibility Guarantees via Constrained Mean-Field Reinforcement Learning
Jusup, Matej, Zhang, Kenan, Hu, Zhiyuan, Pรกsztor, Barna, Krause, Andreas, Corman, Francesco
The rapid expansion of ride-sourcing services such as Uber, Lyft, and Didi Chuxing has fundamentally reshaped urban transportation by offering flexible, on-demand mobility via mobile applications. Despite their convenience, these platforms confront significant operational challenges, particularly vehicle rebalancing - the strategic repositioning of thousands of vehicles to address spatiotemporal mismatches in supply and demand. Inadequate rebalancing results in prolonged rider waiting times, inefficient vehicle utilization, and inequitable distribution of services, leading to disparities in driver availability and income. To tackle these complexities, we introduce scalable continuous-state mean-field control (MFC) and reinforcement learning (MFRL) models that explicitly represent each vehicle's precise location and employ continuous repositioning actions guided by the distribution of other vehicles. To ensure equitable service distribution, an accessibility constraint is integrated within our optimal control formulation, balancing operational efficiency with equitable access to the service across geographic regions. Our approach acknowledges realistic conditions, including inherent stochasticity in transitions, the simultaneous occurrence of vehicle-rider matching, vehicles' rebalancing and cruising, and variability in rider behaviors. Crucially, we relax the traditional mean-field assumption of equal supply-demand volume, better reflecting practical scenarios. Extensive empirical evaluation using real-world data-driven simulation of Shenzhen demonstrates the real-time efficiency and robustness of our approach at the scale of tens of thousands of vehicles. The code is available at https://github.com/mjusup1501/mf-vehicle-rebalancing.
Pro-Routing: Proactive Routing of Autonomous Multi-Capacity Robots for Pickup-and-Delivery Tasks
Garces, Daniel, Gil, Stephanie
We consider a multi-robot setting, where we have a fleet of multi-capacity autonomous robots that must service spatially distributed pickup-and-delivery requests with fixed maximum wait times. Requests can be either scheduled ahead of time or they can enter the system in real-time. In this setting, stability for a routing policy is defined as the cost of the policy being uniformly bounded over time. Most previous work either solve the problem offline to theoretically maintain stability or they consider dynamically arriving requests at the expense of the theoretical guarantees on stability. In this paper, we aim to bridge this gap by proposing a novel proactive rollout-based routing framework that adapts to real-time demand while still provably maintaining the stability of the learned routing policy. We derive provable stability guarantees for our method by proposing a fleet sizing algorithm that obtains a sufficiently large fleet that ensures stability by construction. To validate our theoretical results, we consider a case study on real ride requests for Harvard's evening Van System. We also evaluate the performance of our framework using the currently deployed smaller fleet size. In this smaller setup, we compare against the currently deployed routing algorithm, greedy heuristics, and Monte-Carlo-Tree-Search-based algorithms. Our empirical results show that our framework maintains stability when we use the sufficiently large fleet size found in our theoretical results. For the smaller currently deployed fleet size, our method services 6% more requests than the closest baseline while reducing median passenger wait times by 33%.
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy
Zhao, Zhonghan, Zhang, Wenwei, Huang, Haian, Liu, Kuikun, Gao, Jianfei, Wang, Gaoang, Chen, Kai
Reasoning before action and imagining potential outcomes (i.e., world models) are essential for embodied agents operating in complex open-world environments. Yet, prior work either incorporates only one of these abilities in an end-to-end agent or integrates multiple specialized models into an agent system, limiting the learning efficiency and generalization of the policy. Thus, this paper makes the first attempt to synergize Reasoning and Imagination in an end-to-end Generalist policy, termed RIG. To train RIG in an end-to-end manner, we construct a data pipeline that progressively integrates and enriches the content of imagination and reasoning in the trajectories collected from existing agents. The joint learning of reasoning and next image generation explicitly models the inherent correlation between reasoning, action, and dynamics of environments, and thus exhibits more than $17\times$ sample efficiency improvements and generalization in comparison with previous works. During inference, RIG first reasons about the next action, produces potential action, and then predicts the action outcomes, which offers the agent a chance to review and self-correct based on the imagination before taking real actions. Experimental results show that the synergy of reasoning and imagination not only improves the robustness, generalization, and interoperability of generalist policy but also enables test-time scaling to enhance overall performance.
ActionStudio: A Lightweight Framework for Data and Training of Large Action Models
Zhang, Jianguo, Hoang, Thai, Zhu, Ming, Liu, Zuxin, Wang, Shiyu, Awalgaonkar, Tulika, Prabhakar, Akshara, Chen, Haolin, Yao, Weiran, Liu, Zhiwei, Tan, Juntao, Niebles, Juan Carlos, Heinecke, Shelby, Wang, Huan, Savarese, Silvio, Xiong, Caiming
Action models are essential for enabling autonomous agents to perform complex tasks. However, training large action models remains challenging due to the diversity of agent environments and the complexity of agentic data. Despite growing interest, existing infrastructure provides limited support for scalable, agent-specific fine-tuning. We present ActionStudio, a lightweight and extensible data and training framework designed for large action models. ActionStudio unifies heterogeneous agent trajectories through a standardized format, supports diverse training paradigms including LoRA, full fine-tuning, and distributed setups, and integrates robust preprocessing and verification tools. We validate its effectiveness across both public and realistic industry benchmarks, demonstrating strong performance and practical scalability. We open-sourced code and data at https://github.com/SalesforceAIResearch/xLAM to facilitate research in the community.
GenSwarm: Scalable Multi-Robot Code-Policy Generation and Deployment via Language Models
Ji, Wenkang, Chen, Huaben, Chen, Mingyang, Zhu, Guobin, Xu, Lufeng, Groร, Roderich, Zhou, Rui, Cao, Ming, Zhao, Shiyu
The present paradigm of developing multi-robot systems follows a complex and labor-intensive process that involves steps like task analysis, algorithm design, code programming, simulation validation, and real-world deployment. This paradigm requires skilled professionals who are familiar with both theories and software/hardware implementation, incurring high costs in human resources. Moreover, it does not adapt well to dynamically changing tasks: the emergence of a new task requires the repetition of the complex process. Automatic generation and deployment of control policies for multi-robot systems is an appealing paradigm, as it promises substantial savings in terms of human effort and other resources [3-5]. However, this paradigm is nontrivial to realize as a multi-robot system as a whole cannot be programmed directly; rather, a desired collective behavior can be achieved only by programming each individual robot, which relies on its locally available information. Previous methods for automatic development of multi-robot swarming are primarily based on optimization techniques [3, 5]. For instance, an objective function is first crafted to mathematically describe a desired task and then optimized to generate policies through methods such as evolutionary computation [5-7] or systematic search [8]. Despite their promise, these optimization methods face the common limitation of requiring manual crafting of objective functions.
Collisionless and Decentralized Formation Control for Strings
Choi, Young-Pil, Kalise, Dante, Peters, Andrรฉs A.
Multi-agent systems (MAS) have proven to be a versatile framework for studying diverse scalability problems in Science and Engineering, such as dynamic networks [35], autonomous vehicles [5], collective behaviour of humans or animals [42, 43], and many others [2, 6]. Mathematically, MAS are often modelled as large-scale dynamical systems where each agent can be considered as a subset of states, updated via interaction forces such as attraction, repulsion, alignment, etc., [27, 19] or through the optimization of a pay-off function in a control/game framework [32, 29]. In this work, we approach the study of MAS from a control viewpoint. We study a class of sparsely interconnected agents in one dimension, interacting through nonlinear couplings and a decentralized control law. The elementary building block of our approach is the celebrated Cucker-Smale model for consensus dynamics [19], which corresponds to a MAS where each agent is endowed with second-order nonlinear dynamics for velocity alignment, and where the influence of neighbouring agents decays with distance. The Cucker-Smale model and variants can represent the physical motion of agents on the real line, inspired by autonomous vehicle formations in platooning with a nearest-neighbour interaction scheme [41, 44].
DebFlow: Automating Agent Creation via Agent Debate
Su, Jinwei, Xia, Yinghui, Shi, Ronghua, Wang, Jianhui, Huang, Jianuo, Wang, Yijin, Shi, Tianyu, Jingsong, Yang, He, Lewei
Large language models (LLMs) have demonstrated strong potential and impressive performance in automating the generation and optimization of workflows. However, existing approaches are marked by limited reasoning capabilities, high computational demands, and significant resource requirements. To address these issues, we propose DebFlow, a framework that employs a debate mechanism to optimize workflows and integrates reflexion to improve based on previous experiences. We evaluated our method across six benchmark datasets, including HotpotQA, MATH, and ALFWorld. Our approach achieved a 3\% average performance improvement over the latest baselines, demonstrating its effectiveness in diverse problem domains. In particular, during training, our framework reduces resource consumption by 37\% compared to the state-of-the-art baselines. Additionally, we performed ablation studies. Removing the Debate component resulted in a 4\% performance drop across two benchmark datasets, significantly greater than the 2\% drop observed when the Reflection component was removed. These findings strongly demonstrate the critical role of Debate in enhancing framework performance, while also highlighting the auxiliary contribution of reflexion to overall optimization.
AI2Agent: An End-to-End Framework for Deploying AI Projects as Autonomous Agents
Chen, Jiaxiang, Shi, Jingwei, Gan, Lei, Zhang, Jiale, Zhang, Qingyu, Zhang, Dongqian, Pang, Xin, Li, Zhucong, Xu, Yinghui
As AI technology advances, it is driving innovation across industries, increasing the demand for scalable AI project deployment. However, deployment remains a critical challenge due to complex environment configurations, dependency conflicts, cross-platform adaptation, and debugging difficulties, which hinder automation and adoption. This paper introduces AI2Agent, an end-to-end framework that automates AI project deployment through guideline-driven execution, self-adaptive debugging, and case \& solution accumulation. AI2Agent dynamically analyzes deployment challenges, learns from past cases, and iteratively refines its approach, significantly reducing human intervention. To evaluate its effectiveness, we conducted experiments on 30 AI deployment cases, covering TTS, text-to-image generation, image editing, and other AI applications. Results show that AI2Agent significantly reduces deployment time and improves success rates. The code and demo video are now publicly accessible.