PaddlePaddle Fluid: Elastic Deep Learning on Kubernetes - Baidu Research

#artificialintelligence 

Two open source communities--PaddlePaddle, the deep learning framework originated in Baidu, and Kubernetes, the most famous containerized application scheduler--are announcing the Elastic Deep Learning (EDL) feature in PaddlePaddle's new release codenamed Fluid. Fluid EDL includes a Kubernetes controller, PaddlePaddle auto-scaler, which changes the number of processes of distributed jobs according to the idle hardware resource in the cluster, and a new fault-tolerable architecture as described in the PaddlePaddle design doc. Industrial deep learning requires significant computation power. Research labs and companies often build GPU clusters managed by SLURM, MPI, or SGE. These clusters either run a submitted job if it requires less than the idle resource, or pend the job for an unpredictably long time.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found