Dataflow architectures are general computation engines optimized for the execution of fme-grain parallel algorithms. Neural networks can be simulated on these systems with certain advantages. In this paper, we review dataflow architectures, examine neural network simulation performance on a new generation dataflow machine, compare that performance to other simulation alternatives, and discuss the benefits and drawbacks of the dataflow approach.
Wave Computing was founded with the vision of delivering deep learning computers with game-changing computational performance and energy efficiency. Their objective is to enable businesses to analyze complex data in real-time with more accurate results through a fluid discovery and improvement in Deep Neural Network (DNN) development and training with our family of computers. Wave developed a novel Dataflow Processing Unit (DPU) architecture as part of a strategy to natively support a new wave of dataflow model based deep learning frameworks such as Google's TensorFlow and Microsoft's CNTK. Wave's family of deep learning computers achieves its best-in-class DNN training and inference performance through its native support of dataflow model based deep learning frameworks, its CPU-less high bandwidth shared memory architecture, and DPU's 16,000 parallel processing elements power and massive memory bandwidth. This results in a family of computers that delivers more than 10x improvement in compute performance for DNN training and more than 100x improvement in performance for DNN inference.
Real-time streaming predictions using Google Cloud Dataflow and Google Cloud Machine Learning Google Cloud Dataflow is probably already embedded somewhere in your daily life, and enables companies to process huge amounts of data in real-time. But imagine that you could combine this - in real-time as well - with the prediction power of neural networks. This is exactly what we will talk about in our latest blogpost! It all started with some fiddling around with Apache Beam, an incubating Apache project that provides a programming model that handles both batch and stream processing jobs. We wanted to test the streaming capabilities running a pipeline on Google Cloud Dataflow, a Google managed service to run such pipelines.
For one, the execution of the code can be systematically parallelized. This allows the developer to avoid thinking about locks, or threads, or how to use async APIs and callbacks. Yes, in Python, you can write a bunch of code for functions to run concurrently, but you won't be able to use multiple CPUs, unless you start sharding things across processes, resulting in quite a bit of orchestration and marshaling code to maintain (psst … Composable does similar things under the hood). Making your code performant and scalable doesn't become an after-thought .. it's baked into the beginning with Composable.
The rapid evolution of deep learning has started an AI arms race. Last year, venture capitalists poured more than $1.5 billion into semiconductor start-ups and there are now some 45 companies designing chips purpose-built for artificial intelligence tasks including Google with its Tensor Processing Unit (TPU). After quietly testing its "early access" system for nearly a year, one of these startups, Wave Computing, is close to announcing its first commercial product. And it is promising that a novel approach will deliver some big gains in terms of both performance and ease of use for training neural networks. "A bunch of companies will have TPU knock-offs, but that's not what we do--this was a multi-year, multi millions of dollars effort to develop a completely new architecture," CEO Derek Meyer said in an interview.