Project Catapult is the code name for a Microsoft Research (MSR) enterprise-level initiative that is transforming cloud computing by augmenting CPUs with an interconnected and configurable compute layer composed of programmable silicon. We are living in an era where information grows exponentially and creates the need for massive computing power to process that information. At the same time, advances in silicon fabrication technology are approaching theoretical limits, and Moore's Law has run its course. Chip performance improvements no longer keep pace with the needs of cutting-edge, computationally expensive workloads like software-defined networking (SDN) and artificial intelligence (AI). To create a faster, more intelligent cloud that keeps up with growing appetites for computing power, datacenters need to add other processors distinctly suited for critical workloads.
Datacenters, especially the really big guys known as the Super 7 (Alibaba,, Baidu, Facebook, Google, Microsoft and Tencent), are experiencing significant growth in key workloads that require more performance than can squeezed out of even the fastest CPUs. Applications such as Deep Neural Networks (DNN) for Artificial Intelligences (AIs), complex data analytics, 4K live streaming video and advanced networking and security features are increasingly being offloaded to super-fast accelerators that can provide 10X or more the performance of a CPU. NVIDIA GPUs in particular have benefited enormously from the training portion of machine learning, reporting a 193% Y/Y last quarter in their datacenter segment, which is now approaching a $1B run-rate business. Microsoft has recently announced that Field Programmable Gate Array (FPGA) accelerators have become pervasive in their datacenters. Soon after, announced that Baidu is using their devices for acceleration of machine learning applied to speech processing and autonomous vehicles.
The insatiable appetite for higher throughput and lower latency – particularly where edge analytics and AI, network functions, or for a range of data center acceleration needs are concerned – has compelled IT managers and chip makers to venture out, increasingly, beyond CPUs and GPUs. The "inherent parallelism" of FPGAs (see below) to handle specialized workloads in AI- and HPDA-related implementations has brought on greater investments from IT decision makers and vendors, who see increasing justification for the challenge of FPGA programming. Of course, adoption of unfamiliar technologies is always painful and slow, particularly those without a built-out ecosystem of frameworks and APIs that simplify their use. Why are FPGAs bursting out of their communication, industrial and military niches and into the data center? Partly because of the limits of CPUs, which have their roots on the desktop and were, said Steve Conway, senior research VP at Hyperion Research, never really intended for advanced computing.
Why are so many companies suddenly jumping into the datacenter accelerator game? Major chip companies such as, and as well as startups such as Nervana (being acquired by Intel), Wave Computing, GraphCore, KnuPath and others are all vying for a piece of a rapidly growing market. That market consists primarily of just seven customers, the world's largest datacenters:,,,,, and Tencent. These companies are increasingly turning to technologies that can run specific algorithms at least 10 times faster in order to meet the demand for applications such as machine learning, ultra-high-definition video streaming and complex data analytics. While GPUs (Graphics Processing Units) from NVIDIA have been leading much of this trend, Field Programmable Gate Arrays (FPGAs) hope to now contend to become a major player.
Cloud and datacenter architects searching for new ways to pack more artificial intelligence horsepower into already constrained spaces will want to take a close look at Intel's new Nervana Neural Network Processors. Depending on the application, the processors may offer four times the performance or one-fifth the power draw as commercially available alternatives. The new processors are Intel's first ASIC offerings tailored specifically for deep learning workloads. The company announced last week the processors are shipping now. In addition to the NNP-T1000 for training and the NNP-I1000 for inference, Intel also announced the coming generation of the Movidius Myriad Vision Processing Unit, which is designed for AI vision and inference processing at the edge.