A Comparison with Other General MLCO Frameworks
–Neural Information Processing Systems
Since obtaining ground-truth labels is non-trivial for NP-hard combinatorial tasks, there exist several efforts developing general MLCO methods without any requirement of ground-truth labels, including [8, 29, 30], our single-level baseline PPO-Single and our proposed PPO-BiHyb. Here we make a comparison concerning the model details and the capable problems of these methods. We would also like to discuss the limitations of the approaches including ours. For S2V-DQN [30] and NeuRewritter [8], training the RL model is challenging due to the sparse reward and large action space issues especially for large-scale problems. Specifically, for graphs with m nodes, the action space of S2V-DQN and NeuRewritter is m, and S2V-DQN requires O(m) actions to terminate for most problems when the number of decisions is proportional to m.
Neural Information Processing Systems
Mar-21-2025, 02:04:38 GMT