Decentralized Distributed PPO: Solving PointGoal Navigation
Wijmans, Erik, Kadian, Abhishek, Morcos, Ari, Lee, Stefan, Essa, Irfan, Parikh, Devi, Savva, Manolis, Batra, Dhruv
–arXiv.org Artificial Intelligence
DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever'stale'), making it conceptually simple and easy to implement. In our experiments on training virtual robots to navigate in Habitat-Sim (Savva et al., 2019), DD-PPO exhibits near-linear scaling - achieving a speedup of 107x on 128 GPUs over a serial implementation. We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience) - over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs. This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially'solves' the task - near-perfect autonomous navigation in an unseen environment without access to a map, directly from an RGB-D camera and a GPS Compass sensor. Fortuitously, error vs computation exhibits a power-law-like distribution; thus, 90% of peak performance is obtained relatively early (at 100 million steps) and relatively cheaply (under 1 day with 8 GPUs). Finally, we show that the scene understanding and navigation policies learned can be transferred to other navigation tasks - the analog of'ImageNet pre-training task-specific fine-tuning' for embodied AI. Our model outperforms ImageNet pre-trained CNNs on these transfer tasks and can serve as a universal resource (all models code will be publicly available). 1 I NTRODUCTION Recent advances in deep reinforcement learning (RL) have given rise to systems that can outperform human experts at variety of games (Silver et al., 2017; Tian et al., 2019; OpenAI, 2018). These advances, even more-so than those from supervised learning, rely on significant numbers of training samples, making them impractical without large-scale, distributed parallelization. Thus, scaling RL via multi-node distribution is of importance to AI - that is the focus of this work. Several works have proposed systems for distributed RL (Heess et al., 2017; Liang et al., 2018; Tian et al., 2019; Silver et al., 2016; OpenAI, 2018; Espeholt et al., 2018). These works utilize two core components: 1) workers that collect experience ('rollout workers'), and 2) a parameter server that optimizes the model. The rollout workers are then distributed across, potentially, thousands of CPUs 1 .
arXiv.org Artificial Intelligence
Nov-1-2019
- Country:
- North America > United States
- Oregon (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Leisure & Entertainment (0.68)
- Technology: