multiple optima
POMO: Policy Optimization with Multiple Optima for Reinforcement Learning
In neural combinatorial optimization (CO), reinforcement learning (RL) can turn a deep neural net into a fast, powerful heuristic solver of NP-hard problems. This approach has a great potential in practical applications because it allows near-optimal solutions to be found without expert guides armed with substantial domain knowledge. We introduce Policy Optimization with Multiple Optima (POMO), an end-to-end approach for building such a heuristic solver. POMO is applicable to a wide range of CO problems. It is designed to exploit the symmetries in the representation of a CO solution.
Review for NeurIPS paper: POMO: Policy Optimization with Multiple Optima for Reinforcement Learning
Correctness: The discussion on baseline's for POMO to me are a bit misleading. This is somewhat of a nit though. First, the use of "traditionally" is incorrect. Earliest work (including the REINFORCE paper if I recall correctly) make use of a rolling average baseline. Newer works do use more complicated baselines, but for a reason!
Review for NeurIPS paper: POMO: Policy Optimization with Multiple Optima for Reinforcement Learning
Three reviewers support accepting the paper, one argues for rejection. From the reviews, rebuttal and discussion, the consensus seemed to be that the paper has an interesting new idea and good empirical results. The debate was around how much novelty there is, and how likely it is for the idea to be useful in the future, which are slightly more subjective concerns. I recommend acceptance, and I hope future work will show that this was a valuable stepping stone. I still recommend that the authors revise the paper according to the reviewer's suggestions, in particular in terms of not making overstated claims and giving the reader broader context.
POMO: Policy Optimization with Multiple Optima for Reinforcement Learning
In neural combinatorial optimization (CO), reinforcement learning (RL) can turn a deep neural net into a fast, powerful heuristic solver of NP-hard problems. This approach has a great potential in practical applications because it allows near-optimal solutions to be found without expert guides armed with substantial domain knowledge. We introduce Policy Optimization with Multiple Optima (POMO), an end-to-end approach for building such a heuristic solver. POMO is applicable to a wide range of CO problems. It is designed to exploit the symmetries in the representation of a CO solution.