minimization and proximal policy optimization
Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization
Training deep neural networks requires an exorbitant amount of computation resources, including a heterogeneous mix of GPU and CPU devices. It is critical to place operations in a neural network on these devices in an optimal way, so that the training process can complete within the shortest amount of time. The state-of-the-art uses reinforcement learning to learn placement skills by repeatedly performing Monte-Carlo experiments. However, due to its equal treatment of placement samples, we argue that there remains ample room for significant improvements. In this paper, we propose a new joint learning algorithm, called Post, that integrates cross-entropy minimization and proximal policy optimization to achieve theoretically guaranteed optimal efficiency. In order to incorporate the cross-entropy method as a sampling technique, we propose to represent placements using discrete probability distributions, which allows us to estimate an optimal probability mass by maximal likelihood estimation, a powerful tool with the best possible efficiency. We have implemented Post in the Google Cloud platform, and our extensive experiments with several popular neural network training benchmarks have demonstrated clear evidence of superior performance: with the same amount of learning time, it leads to placements that have training times up to 63.7% shorter over the state-of-the-art.
Reviews: Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization
This is a great work as it tackles an important problem: graph partitioning in heterogeneous/multi-device settings. There is an increasing number of problems that could benefit from resource allocation optimization techniques such as the one described in this work. ML and specifically RL techniques have been recently developed to solve the problem of device placement. This work addresses one of the main deficiencies of the prior work by making more sample efficient (as demonstrated by empirical results). The novelty is in the way the placement parameters are trained: As oppose to directly train a placement policy for best runtime, a softmax is used to model the distribution of op placements on devices (for each device among the pool of available devices.)
Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization
Gao, Yuanxiang, Chen, Li, Li, Baochun
Training deep neural networks requires an exorbitant amount of computation resources, including a heterogeneous mix of GPU and CPU devices. It is critical to place operations in a neural network on these devices in an optimal way, so that the training process can complete within the shortest amount of time. The state-of-the-art uses reinforcement learning to learn placement skills by repeatedly performing Monte-Carlo experiments. However, due to its equal treatment of placement samples, we argue that there remains ample room for significant improvements. In this paper, we propose a new joint learning algorithm, called Post, that integrates cross-entropy minimization and proximal policy optimization to achieve theoretically guaranteed optimal efficiency. In order to incorporate the cross-entropy method as a sampling technique, we propose to represent placements using discrete probability distributions, which allows us to estimate an optimal probability mass by maximal likelihood estimation, a powerful tool with the best possible efficiency.