A Proof of Theorem 2.1

Neural Information Processing Systems 

In this section, we prove the Theorem 2.1, which states a problem As we mentioned in Section 2.2, reward R is a function of a Then, the remaining proof is to show Q(P) has an identical solution set with P . M is the number of heterogeneous optimal solution. TSP CVRP REINFORCE baseline POMO shared baseline [23] Learning rate 1e-4 Weight decay 1e-6 The Number of Encoder Layer 6 Embedding Dimension 128 Attention Head Number 8 Feed Forward Dimension 512 Batch-size 64 Epochs 2,000 8,000 Epoch size 100,000 10,000 The Number of Steps 3.125 M 1.25 M Table 6: Hyperparameter Setting for POMO in TSP and CVRP . Sym-NCO is a training scheme that is attached to the top of the existing DRL-NCO model. First of all, we set identical hyperparameters for Pointer-Net and AM for all tasks: α 0.1 β 0 K 1 L 10 Table 7: Hyperparameter Setting of Sym-NCO for PointerNet and AM.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found