Learning to Search for Vehicle Routing with Multiple Time Windows

Xu, Kuan, Cao, Zhiguang, Zheng, Chenlong, Liu, Linong

arXiv.org Artificial Intelligence 

Acknowledgements: The work was supported by the National Natural Science Foundation of China [Grants 72471216, 72022018, 72091210] and Youth Innovation Promotion Association, Chinese Academy of Sciences [Grant No. 2021454]. A specialized fitness metric quantifying customers' temporal flexibility enhances the shaking phase effectiveness. Computational experiments on realistic unmanned vending machine replenishment scenarios demonstrate RL-AVNS's superior performance. The approach exhibits strong generalization capabilities to unseen problem instances, offering practical value for complex logistics optimization. Learning to Search for Vehicle Routing with Multiple Time Windows A R T I C L E I N F OKeywords: Vehicle routing Multiple time windows Reinforcement learning Unmanned vending machine replenishment A B S T R A C T In this study, we propose a reinforcement learning-based adaptive variable neighborhood search (RL-AVNS) method designed for effectively solving the Vehicle Routing Problem with Multiple Time Windows (VRPMTW). Unlike traditional adaptive approaches that rely solely on historical operator performance, our method integrates a reinforcement learning framework to dynamically select neighborhood operators based on real-time solution states and learned experience. We introduce a fitness metric that quantifies customers' temporal flexibility to improve the shaking phase, and employ a transformer-based neural policy network to intelligently guide operator selection during the local search. Extensive computational experiments are conducted on realistic scenarios derived from the replenishment of unmanned vending machines, characterized by multiple clustered replenishment windows. Results demonstrate that RL-AVNS significantly outperforms traditional variable neighborhood search (VNS), adaptive VNS (AVNS), and state-of-the-art learning-based heuristics, achieving substantial improvements in solution quality and computational efficiency across various instance scales and time window complexities. Particularly notable is the algorithm's capability to generalize effectively to problem instances not encountered during training, underscoring its practical utility for complex logistics scenarios.1. Introduction Vehicle Routing Problems (VRPs) are fundamental to optimizing logistics and transportation systems. They are critical for ensuring timely and cost-effective deliveries in various industries, including e-commerce, healthcare, and food services (Vigo and Toth, 2014; Cordeau et al., 2002). In response to growing customer expectations for personalized services, logistics providers are increasingly offering flexible delivery options to improve service quality and maintain a competitive edge.