Goto

Collaborating Authors

 Search



d1e7b08bdb7783ed4fb10abe92c22ffd-AuthorFeedback.pdf

Neural Information Processing Systems

After thek trajectories, one best trajectory is extracted by running without the8 exploration bonus, and that trajectory is"distilled" into the policyby performing agradient update toincreaseits9 probability. The abovework onsolving23 combinatorial optimization problems using RL is based on the premise that there is room for improvement over24 traditionalsolvers. Please also note that the specific algorithm suggested is very similar to our "full bandit" baseline.35




LearningtoMutatewithHypergradientGuided Population

Neural Information Processing Systems

Toaddress theabovechallenges, wepropose anovelhyperparameter mutation (HPM) scheduling algorithm in this study, which adopts a population based training framework to explicitly learn a trade-off (i.e., a mutation schedule) between using the hypergradient-guided local search and the mutation-driven global search.