r/devops - FfDL: A Flexible Multi-tenant Deep Learning Platform
Deep learning (DL) is becoming increasingly popular in sev- eral application domains and has made several new applica- tion features involving computer vision, speech recognition and synthesis, self-driving automobiles, drug design, etc. fea- sible and accurate. As a result, large scale "on-premise" and "cloud-hosted" deep learning platforms have become essential infrastructure in many organizations. These systems accept, schedule, manage and execute DL training jobs at scale. This paper describes the design, implementation and our experiences with FfDL, a DL platform used at IBM. We describe how our design balances dependability with scalability, elasticity, flexibility and efficiency.
artificial intelligence, flexible multi-tenant deep learning platform, machine learning, (2 more...)
Sep-23-2019, 00:58:07 GMT
- Technology: