Goto

Collaborating Authors

 tf-replicator


DeepMind TF-Replicator Simplifies Model Deployment on Cluster Architectures

#artificialintelligence

DeepMind's Research Platform Team has open-sourced TF-Replicator, a framework that enables researchers without previous experience with the distributed system to deploy their TensorFlow models on GPUs and Cloud TPUs. The move aims to strengthen AI research and development. Synced invited Yuan Tang, a senior software engineer at Ant Financial, to share his thoughts on TF-Replicator. How would you describe TF-Replicator? TF-Replicator is a framework to simplify the writing of distributed TensorFlow code for training machine learning models, so that they can be effortlessly deployed to different cluster architectures.


Two New Frameworks that Google and DeepMind are Using to Scale Deep Learning Workflows

#artificialintelligence

Your greatest strength can become your biggest weakness says the old proverb and that certainly applies to deep learning models. The entire deep learning space was possible in part to the ability of deep neural networks to scale across GPU topologies. However, that same ability to scale resulted in the creation of computationally intensive programs that result operationally challenging to most organizations. From training to optimization, the lifecycle of deep learning programs requires robust infrastructure building blocks to be able to parallelize and scale computation workloads. While deep learning frameworks are evolving at a rapid pace, the corresponding infrastructure models remain relatively nascent.


Racist self-driving car scare debunked, inside AI black boxes, Google helps folks go with the TensorFlow...

#artificialintelligence

Roundup Hello, here's a quick recap on all the latest AI-related news beyond what we've already reported this week. You may have seen news reports that autonomous cars are unlikely to detect pedestrians crossing the road if they have dark skin, and thus run them over. And yes, the internal alarm bells in your head should be going off, as a closer look at the research behind the stories shows all those headlines screaming about racist AI are a little off the mark. The academic paper at the heart of the matter described a series of experiments testing different computer vision models, such as the Faster R-CNN model and R-50-FPN, on images of pedestrians with different skin tones. The study's authors, based at the Georgia Institute of Technology in the US, described how they paid humans to look through the collection of roughly 3,500 photos, and individually tag people in the snaps as either "LS" for light skin or "DS" for dark skin, and then trained the neural networks using this dataset.


TF-Replicator: Distributed Machine Learning for Researchers DeepMind

#artificialintelligence

At DeepMind, the Research Platform Team builds infrastructure to empower and accelerate our AI research. Today, we are excited to share how we developed TF-Replicator, a software library that helps researchers deploy their TensorFlow models on GPUs and Cloud TPUs with minimal effort and no previous experience with distributed systems. This blog post gives an overview of the ideas and technical challenges underlying TF-Replicator. For a more comprehensive description, please read our arXiv paper. A recurring theme in recent AI breakthroughs -- from AlphaFold to BigGAN to AlphaStar -- is the need for effortless and reliable scalability.


TF-Replicator: Distributed Machine Learning for Researchers

arXiv.org Machine Learning

We describe TF-Replicator, a framework for distributed machine learning designed for DeepMind researchers and implemented as an abstraction over TensorFlow. TF-Replicator simplifies writing data-parallel and model-parallel research code. The same models can be effortlessly deployed to different cluster architectures (i.e. one or many machines containing CPUs, GPUs or TPU accelerators) using synchronous or asynchronous training regimes. To demonstrate the generality and scalability of TF-Replicator, we implement and benchmark three very different models: (1) A ResNet-50 for ImageNet classification, (2) a SN-GAN for class-conditional ImageNet image generation, and (3) a D4PG reinforcement learning agent for continuous control. Our results show strong scalability performance without demanding any distributed systems expertise of the user. The TF-Replicator programming model will be open-sourced as part of TensorFlow 2.0 (see https://github.com/tensorflow/community/pull/25).