Goto

Collaborating Authors

 ffdl


r/devops - FfDL: A Flexible Multi-tenant Deep Learning Platform

#artificialintelligence

Deep learning (DL) is becoming increasingly popular in sev- eral application domains and has made several new applica- tion features involving computer vision, speech recognition and synthesis, self-driving automobiles, drug design, etc. fea- sible and accurate. As a result, large scale "on-premise" and "cloud-hosted" deep learning platforms have become essential infrastructure in many organizations. These systems accept, schedule, manage and execute DL training jobs at scale. This paper describes the design, implementation and our experiences with FfDL, a DL platform used at IBM. We describe how our design balances dependability with scalability, elasticity, flexibility and efficiency.


H2O-3 on FfDL: Bringing deep learning and machine learning closer together

#artificialintelligence

This post is co-authored by Animesh Singh, Nicholas Png, Tommy Li, and Vinod Iyengar. Deep learning frameworks like TensorFlow, PyTorch, Caffe, MXNet, and Chainer have reduced the effort and skills needed to train and use deep learning models. But for AI developers and data scientists, it's still a challenge to set up and use these frameworks in a consistent manner for distributed model training and serving. The open source Fabric for Deep Learning (FfDL) project provides a consistent way for AI developers and data scientists to use deep learning as a service on Kubernetes and to use Jupyter notebooks to execute distributed deep learning training for models written with these multiple frameworks. Now, FfDL is announcing a new addition that brings together that deep learning training capability with state-of-the-art machine learning methods.


IBM lures developers with AI and machine learning projects

#artificialintelligence

As part of this expansion, IBM added more data scientists and AI engineers, which has resulted in new projects, such as the Model Asset eXchange (MAX) and the Fabric for Deep Learning (FfDL) which is pronounced "fiddle." MAX is an open source ecosystem for data scientists and AI developers to share and consume models that use machine learning engines, such as TensorFlow, PyTorch and Caffe2, Diaz said. It also provides a standard approach to classify, annotate, and deploy these models for prediction and inferencing. Additionally, developers can train and deploy MAX models for production workloads that use Watson Studio, such as internet-of-things applications, said Guido Jouret, chief digital officer at ABB. IBM's MAX not only avoids the cost and time for developers to create these models themselves, but they also get access to the open source community to continually add and improve on these models, said Kathleen Walch, senior analyst at Cognilytica, based in Washington, D.C.. "It helps level the playing field for smaller companies [that] don't have as much data or resources," she said. Meanwhile, FfDL presents a cloud-native service for popular open source frameworks TensorFlow, Caffe and PyTorch.


IBM/FfDL

@machinelearnbot

This repository contains the core services of the FfDL (Fabric for Deep Learning) platform. FfDL is an operating system "fabric" for Deep Learning Once installed, use the command make minikube to start Minikube and set up local network routes. The minimum recommended capacity for FfDL is 4GB Memory and 2 CPUs. If you already have a FfDL deployment up and running, you can jump to FfDL User Guide to use FfDL for training your deep learning models. If you are getting started and want to setup your own FfDL deployment, please follow the steps below.


Fabric for Deep Learning

#artificialintelligence

According to Gartner, artificial intelligence will be the most disruptive class of technology over the next 10 years due to radical computational power, near-endless amounts of data, and unprecedented advances in deep learning. The rise of deep learning has been fueled by three recent trends: the explosion in the amount of training data; the use of accelerators such as graphics processing units (GPUs); and the advancement in training algorithms and neural network architectures. To realize the full potential of this rising trend, we want the technology to be easily accessible to the people it matters most to: data scientists and AI developers. Training deep neural networks, known as deep learning, is currently highly complex and computationally intensive. It requires a highly tuned system with the right combination of software, drivers, compute, memory, network, and storage resources.