Reviews: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning

Neural Information Processing Systems 

Originality: The use of graph neural networks appears novel (concurrent with Paliwal), as does the sweep order (for which I don't know other papers, at least for this application of graph neural networks). The trick of using architecture search as a dataset also seems novel, and I'm quite happy with this idea. Quality: The submission is sound, but I have a few minor concerns: 1. It's possible REINFORCE is good enough, but I'm skeptical given that (1) REINFORCE is much worse in normal RL environments and (2) the paper explicitly presents evidence that using an incremental baseline helps learning. The learned value function in PPO, Q-learning, etc. could potentially play the same variance reduction role or even do quite a lot better (presumably not all of the variance due to upstream moves is explained by reward so far).