pomo
Appendix
Inthis paper,weconsider various distributions forthenode coordinates inVRPs, followed which we randomly generate instances for both training and testing. Below we present details on how to generate those instances. Uniform distribution.Itconsiders uniformly distributed nodes. It considers multiple (nc) clusters, where we setnc = 3. Then, instead ofgathering allnodes towards the centroid inImplosion distribution, itmovesawaythose nodes from the circle (radiusRec =0.3) and explode them outside the circle, which follow the direction vector between the centroidϵe and the corresponding nodes.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
POMO: Policy Optimization with Multiple Optima for Reinforcement Learning
In neural combinatorial optimization (CO), reinforcement learning (RL) can turn a deep neural net into a fast, powerful heuristic solver of NP-hard problems. This approach has a great potential in practical applications because it allows near-optimal solutions to be found without expert guides armed with substantial domain knowledge. We introduce Policy Optimization with Multiple Optima (POMO), an end-to-end approach for building such a heuristic solver. POMO is applicable to a wide range of CO problems. It is designed to exploit the symmetries in the representation of a CO solution.
A Proof of Theorem 2.1
In this section, we prove the Theorem 2.1, which states a problem As we mentioned in Section 2.2, reward R is a function of a Then, the remaining proof is to show Q(P) has an identical solution set with P . M is the number of heterogeneous optimal solution. TSP CVRP REINFORCE baseline POMO shared baseline [23] Learning rate 1e-4 Weight decay 1e-6 The Number of Encoder Layer 6 Embedding Dimension 128 Attention Head Number 8 Feed Forward Dimension 512 Batch-size 64 Epochs 2,000 8,000 Epoch size 100,000 10,000 The Number of Steps 3.125 M 1.25 M Table 6: Hyperparameter Setting for POMO in TSP and CVRP . Sym-NCO is a training scheme that is attached to the top of the existing DRL-NCO model. First of all, we set identical hyperparameters for Pointer-Net and AM for all tasks: α 0.1 β 0 K 1 L 10 Table 7: Hyperparameter Setting of Sym-NCO for PointerNet and AM.
Lifelong Learner: Discovering Versatile Neural Solvers for Vehicle Routing Problems
Feng, Shaodi, Lin, Zhuoyi, Zhou, Jianan, Zhang, Cong, Li, Jingwen, Chen, Kuan-Wen, Jayavelu, Senthilnath, Ong, Yew-Soon
Deep learning has been extensively explored to solve vehicle routing problems (VRPs), which yields a range of data-driven neural solvers with promising outcomes. However, most neural solvers are trained to tackle VRP instances in a relatively monotonous context, e.g., simplifying VRPs by using Euclidean distance between nodes and adhering to a single problem size, which harms their off-the-shelf application in different scenarios. To enhance their versatility, this paper presents a novel lifelong learning framework that incrementally trains a neural solver to manage VRPs in distinct contexts. Specifically, we propose a lifelong learner (LL), exploiting a Transformer network as the backbone, to solve a series of VRPs. The inter-context self-attention mechanism is proposed within LL to transfer the knowledge obtained from solving preceding VRPs into the succeeding ones. On top of that, we develop a dynamic context scheduler (DCS), employing the cross-context experience replay to further facilitate LL looking back on the attained policies of solving preceding VRPs. Extensive results on synthetic and benchmark instances (problem sizes up to 18k) show that our LL is capable of discovering effective policies for tackling generic VRPs in varying contexts, which outperforms other neural solvers and achieves the best performance for most VRPs.
- North America > United States (0.14)
- Asia > Singapore (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)
Learning Generalizable Models for Vehicle Routing Problems via Knowledge Distillation (Appendix) A Details of the considered distributions
In this paper, we consider various distributions for the node coordinates in VRPs, followed which we randomly generate instances for both training and testing. Below we present details on how to generate those instances. It considers uniformly distributed nodes. An exemplary instance is displayed in Figure 1(i). It considers a mixture of the two distributions above, each with half of the nodes.