Learning Improvement Heuristics for Solving the Travelling Salesman Problem

Wu, Yaoxin, Song, Wen, Cao, Zhiguang, Zhang, Jie, Lim, Andrew

arXiv.org Artificial Intelligence 

Recent studies in using deep learning to solve the Travelling Salesman Problem (TSP) focus on construction heuristics, the solution of which may still be far from optimal-ity. To improve solution quality, additional procedures such as sampling or beam search are required. However, they are still based on the same construction policy, which is less effective in refining a solution. In this paper, we propose to directly learn the improvement heuristics for solving TSP based on deep reinforcement learning. We first present a reinforcement learning formulation for the improvement heuristic, where the policy guides selection of the next solution. Then, we propose a deep architecture as the policy network based on self-attention. Extensive experiments show that, improvement policies learned by our approach yield better results than state-of-the-art methods, even from random initial solutions. Moreover, the learned policies are more effective than the traditional handcrafted ones, and robust to different initial solutions with either high or poor quality. 1 Introduction The Travelling Salesman Problem (TSP) is a typical combinatorial optimization problem that has extensive applications in the real world. The problem statement is straightforward: given a set of locations, find the salesman a shortest tour that traverses each location exactly once and returns to the original one. Although having been widely studied for decades, achieving satisfactory performance is still challenging due to its NPhard complexity.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found