Learning Collaborative Policies to Solve NP-hard Routing Problems

Oct-10-2024, 11:43:40 GMT–Neural Information Processing Systems

Recently, deep reinforcement learning (DRL) frameworks have shown potential for solving NP-hard routing problems such as the traveling salesman problem (TSP) without problem-specific expert knowledge. Although DRL can be used to solve complex problems, DRL frameworks still struggle to compete with state-of-the-art heuristics showing a substantial performance gap. This paper proposes a novel hierarchical problem-solving strategy, termed learning collaborative policies (LCP), which can effectively find the near-optimum solution using two iterative DRL policies: the seeder and reviser. The seeder generates as diversified candidate solutions as possible (seeds) while being dedicated to exploring over the full combinatorial action space (i.e., sequence of assignment action). To this end, we train the seeder's policy using a simple yet effective entropy regularization reward to encourage the seeder to find diverse solutions.

candidate solution, learning collaborative policy, solve np-hard routing problem, (3 more...)

Neural Information Processing Systems

Oct-10-2024, 11:43:40 GMT

Conferences Web Page

Add feedback

Industry:
- Transportation (0.86)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Computational Learning Theory (0.68)
  - Reinforcement Learning (0.62)