Goto

Collaborating Authors

 depot




Deep Reinforcement Learning for Drone Route Optimization in Post-Disaster Road Assessment

Gong, Huatian, Sheu, Jiuh-Biing, Wang, Zheng, Yang, Xiaoguang, Yan, Ran

arXiv.org Artificial Intelligence

Rapid post-disaster road damage assessment is critical for effective emergency response, yet traditional optimization methods suffer from excessive computational time and require domain knowledge for algorithm design, making them unsuitable for time-sensitive disaster scenarios. This study proposes an attention-based encoder-decoder model (AEDM) for rapid drone routing decision in post-disaster road damage assessment. The method employs deep reinforcement learning to determine high-quality drone assessment routes without requiring algorithmic design knowledge. A network transformation method is developed to convert link-based routing problems into equivalent node-based formulations, while a synthetic road network generation technique addresses the scarcity of large-scale training datasets. The model is trained using policy optimization with multiple optima (POMO) with multi-task learning capabilities to handle diverse parameter combinations. Experimental results demonstrate two key strengths of AEDM: it outperforms commercial solvers by 20--71\% and traditional heuristics by 23--35\% in solution quality, while achieving rapid inference (1--2 seconds) versus 100--2,000 seconds for traditional methods. The model exhibits strong generalization across varying problem scales, drone numbers, and time constraints, consistently outperforming baseline methods on unseen parameter distributions and real-world road networks. The proposed method effectively balances computational efficiency with solution quality, making it particularly suitable for time-critical disaster response applications where rapid decision-making is essential for saving lives. The source code for AEDM is publicly available at https://github.com/PJ-HTU/AEDM-for-Post-disaster-road-assessment.


Onboard Mission Replanning for Adaptive Cooperative Multi-Robot Systems

Kwan, Elim, Qureshi, Rehman, Fletcher, Liam, Laganier, Colin, Nockles, Victoria, Walters, Richard

arXiv.org Artificial Intelligence

Cooperative autonomous robotic systems have significant potential for executing complex multi-task missions across space, air, ground, and maritime domains. But they commonly operate in remote, dynamic and hazardous environments, requiring rapid in-mission adaptation without reliance on fragile or slow communication links to centralised compute. Fast, on-board replanning algorithms are therefore needed to enhance resilience. Reinforcement Learning shows strong promise for efficiently solving mission planning tasks when formulated as Travelling Salesperson Problems (TSPs), but existing methods: 1) are unsuitable for replanning, where agents do not start at a single location; 2) do not allow cooperation between agents; 3) are unable to model tasks with variable durations; or 4) lack practical considerations for on-board deployment. Here we define the Cooperative Mission Replanning Problem as a novel variant of multiple TSP with adaptations to overcome these issues, and develop a new encoder/decoder-based model using Graph Attention Networks and Attention Models to solve it effectively and efficiently. Using a simple example of cooperative drones, we show our replanner consistently (90% of the time) maintains performance within 10% of the state-of-the-art LKH3 heuristic solver, whilst running 85-370 times faster on a Raspberry Pi. This work paves the way for increased resilience in autonomous multi-agent systems.


An End-to-End Learning Approach for Solving Capacitated Location-Routing Problems

Miao, Changhao, Zhang, Yuntian, Wu, Tongyu, Deng, Fang, Chen, Chen

arXiv.org Artificial Intelligence

THIS WORK HAS BEEN SUBMITTED TO THE IEEE FOR POSSIBLE PUBLICA TION. Abstract--The capacitated location-routing problems (CLRPs) are classical problems in combinatorial optimization, which require simultaneously making location and routing decisions. In CLRPs, the complex constraints and the intricate relationships between various decisions make the problem challenging to solve. With the emergence of deep reinforcement learning (DRL), it has been extensively applied to address the vehicle routing problem and its variants, while the research related to CLRPs still needs to be explored. In this paper, we propose the DRL with heterogeneous query (DRLHQ) to solve CLRP and open CLRP (OCLRP), respectively. We are the first to propose an end-to-end learning approach for CLRPs, following the encoder-decoder structure. In particular, we reformulate the CLRPs as a markov decision process tailored to various decisions, a general modeling framework that can be adapted to other DRL-based methods. T o better handle the interdependency across location and routing decisions, we also introduce a novel heterogeneous querying attention mechanism designed to adapt dynamically to various decision-making stages. Experimental results on both synthetic and benchmark datasets demonstrate superior solution quality and better generalization performance of our proposed approach over representative traditional and DRL-based baselines in solving both CLRP and OCLRP . HE facility location problem (FLP) and vehicle routing problem (VRP) are two critical combinatorial optimization problems (COPs) in transportation and logistics, which are traditionally addressed sequentially. However, planning the routes after facility location may lead to suboptimal solutions due to the interdependencies across various decisions [1], [2]. Therefore, the capacitated location-routing problems (CLRPs) [3] are proposed to simultaneously make location and routing decisions. The CLRPs are one of the most classical topics in the community of operations research and have extensive applications such as supply-chain management [4], emergency management [5], and disaster relief [6]. This work was supported by the National Key Research and Development Program of China No.2022ZD0119703; in part by the National Natural Science Foundations of China (NSFC) under Grant 62273044 and 62022015; in part by the National Natural Science Foundation of China National Science Fund for Distinguished Y oung Scholars under Grant 62025301; in part by the National Natural Science Foundation of China Basic Science Center Program under Grant 62088101. In CLRPs, depots and vehicles are subject to the maximum capacity constraints, and the depots are considered heterogeneous due to distinct capacities and opening costs. Meanwhile, we also study the open CLRP (OCLRP) [7], a variant of CLRP, by considering open-ended routes.


An Agentic Framework with LLMs for Solving Complex Vehicle Routing Problems

Zhang, Ni, Cao, Zhiguang, Zhou, Jianan, Zhang, Cong, Ong, Yew-Soon

arXiv.org Artificial Intelligence

Complex vehicle routing problems (VRPs) remain a fundamental challenge, demanding substantial expert effort for intent interpretation and algorithm design. While large language models (LLMs) offer a promising path toward automation, current approaches still rely on external intervention, which restrict autonomy and often lead to execution errors and low solution feasibility. To address these challenges, we propose an Agentic Framework with LLMs (AFL) for solving complex vehicle routing problems, achieving full automation from problem instance to solution. AFL directly extracts knowledge from raw inputs and enables self-contained code generation without handcrafted modules or external solvers. To improve trustworthiness, AFL decomposes the overall pipeline into three manageable subtasks and employs four specialized agents whose coordinated interactions enforce cross-functional consistency and logical soundness. Extensive experiments on 60 complex VRPs, ranging from standard benchmarks to practical variants, validate the effectiveness and generality of our framework, showing comparable performance against meticulously designed algorithms. Notably, it substantially outperforms existing LLM-based baselines in both code reliability and solution feasibility, achieving rates close to 100% on the evaluated benchmarks.



Hybrid-Balance GFlowNet for Solving Vehicle Routing Problems

Zhang, Ni, Cao, Zhiguang

arXiv.org Artificial Intelligence

Existing GFlowNet-based methods for vehicle routing problems (VRPs) typically employ Trajectory Balance (TB) to achieve global optimization but often neglect important aspects of local optimization. While Detailed Balance (DB) addresses local optimization more effectively, it alone falls short in solving VRPs, which inherently require holistic trajectory optimization. To address these limitations, we introduce the Hybrid-Balance GFlowNet (HBG) framework, which uniquely integrates TB and DB in a principled and adaptive manner by aligning their intrinsically complementary strengths. Additionally, we propose a specialized inference strategy for depot-centric scenarios like the Capacitated Vehicle Routing Problem (CVRP), leveraging the depot node's greater flexibility in selecting successors. Despite this specialization, HBG maintains broad applicability, extending effectively to problems without explicit depots, such as the Traveling Salesman Problem (TSP). We evaluate HBG by integrating it into two established GFlowNet-based solvers, i.e., AGFN and GFACS, and demonstrate consistent and significant improvements across both CVRP and TSP, underscoring the enhanced solution quality and generalization afforded by our approach.


Coordinated Multi-Drone Last-mile Delivery: Learning Strategies for Energy-aware and Timely Operations

Qin, Chuhao, Narayanan, Arun, Pournaras, Evangelos

arXiv.org Artificial Intelligence

Abstract--Drones have recently emerged as a faster, safer, and cost-efficient way for last-mile deliveries of parcels, particularly for urgent medical deliveries highlighted during the pandemic. This paper addresses a new challenge of multi-parcel delivery with a swarm of energy-aware drones, accounting for time-sensitive customer requirements. Each drone plans an optimal multi-parcel route within its battery-restricted flight range to minimize delivery delays and reduce energy consumption. The problem is tackled by decomposing it into three sub-problems: (1) optimizing depot locations and service areas using K-means clustering; (2) determining the optimal flight range for drones through reinforcement learning; and (3) planning and selecting multi-parcel delivery routes via a new optimized plan selection approach. T o integrate these solutions and enhance long-term efficiency, we propose a novel algorithm leveraging actor-critic-based multi-agent deep reinforcement learning. Extensive experimentation using realistic delivery datasets demonstrate an exceptional performance of the proposed algorithm. We provide new insights into economic efficiency (minimize energy consumption), rapid operations (reduce delivery delays and overall execution time), and strategic guidance on depot deployment for practical logistics applications. Unmanned aerial vehicles (UA Vs), commonly known as drones, have gained significant attention as a solution for last-mile delivery, especially in recent years [1]. For instance, the COVID-19 pandemic has highlighted the vulnerabilities of traditional delivery methods, as deliverymen risk spreading the virus. This was particularly problematic in quarantine zones, where customers faced difficulties in accessing logistics services [2], [3]. In contrast, drones offer a safer and more flexible alternative. Due to their high mobility, carrying capacity, and accurate GPS navigation, drones are able to deliver parcels directly to small places such as doorways and balconies, avoiding human contact and traffic congestion.


Solving the Min-Max Multiple Traveling Salesmen Problem via Learning-Based Path Generation and Optimal Splitting

Wang, Wen, Wu, Xiangchen, Wang, Liang, Hu, Hao, Tao, Xianping, Zhang, Linghao

arXiv.org Artificial Intelligence

This study addresses the Min-Max Multiple Traveling Salesmen Problem ($m^3$-TSP), which aims to coordinate tours for multiple salesmen such that the length of the longest tour is minimized. Due to its NP-hard nature, exact solvers become impractical under the assumption that $P \ne NP$. As a result, learning-based approaches have gained traction for their ability to rapidly generate high-quality approximate solutions. Among these, two-stage methods combine learning-based components with classical solvers, simplifying the learning objective. However, this decoupling often disrupts consistent optimization, potentially degrading solution quality. To address this issue, we propose a novel two-stage framework named \textbf{Generate-and-Split} (GaS), which integrates reinforcement learning (RL) with an optimal splitting algorithm in a joint training process. The splitting algorithm offers near-linear scalability with respect to the number of cities and guarantees optimal splitting in Euclidean space for any given path. To facilitate the joint optimization of the RL component with the algorithm, we adopt an LSTM-enhanced model architecture to address partial observability. Extensive experiments show that the proposed GaS framework significantly outperforms existing learning-based approaches in both solution quality and transferability.