Goto

Collaborating Authors

 stratix 10


FPGA chips are coming on fast in the race to accelerate AI

#artificialintelligence

AI is hungry, hyperscale AI ravenous. As AI models rapidly get larger and more complex (an estimated 10x a year), a recent MIT study warns that computational challenges, especially in deep learning, will continue to grow. Service providers, large enterprises and others also face unrelenting pressures to speed up innovation, performance, and rollouts of neural networks and other low-latency, data-intensive applications, often involving exascale cloud and High-Performance Computing (HPC). These dueling demands are driving technology advances and adoption of a growing universe of Field Programmable Gate Arrays (FPGAs). In the early days of exascale computing and AI, these customer-configurable integrated circuits played a key role.


Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing

arXiv.org Artificial Intelligence

This paper presents Systolic-CNN, an OpenCL-defined scalable, run-time-flexible FPGA accelerator architecture, optimized for accelerating the inference of various convolutional neural networks (CNNs) in multi-tenancy cloud/edge computing. The existing OpenCL-defined FPGA accelerators for CNN inference are insufficient due to limited flexibility for supporting multiple CNN models at run time and poor scalability resulting in underutilized FPGA resources and limited computational parallelism. Systolic-CNN adopts a highly pipelined and paralleled 1-D systolic array architecture, which efficiently explores both spatial and temporal parallelism for accelerating CNN inference on FPGAs. Systolic-CNN is highly scalable and parameterized, which can be easily adapted by users to achieve up to 100% utilization of the coarse-grained computation resources (i.e., DSP blocks) for a given FPGA. Systolic-CNN is also run-time-flexible in the context of multi-tenancy cloud/edge computing, which can be time-shared to accelerate a variety of CNN models at run time without the need of recompiling the FPGA kernel hardware nor reprogramming the FPGA. The experiment results based on an Intel Arria/Stratix 10 GX FPGA Development board show that the optimized single-precision implementation of Systolic-CNN can achieve an average inference latency of 7ms/2ms, 84ms/33ms, 202ms/73ms, 1615ms/873ms, and 900ms/498ms per image for accelerating AlexNet, ResNet-50, ResNet-152, RetinaNet, and Light-weight RetinaNet, respectively. Codes are available at https://github.com/PSCLab-ASU/Systolic-CNN.


Intel Drops A Bomb On The Silicon AI Market (NASDAQ:INTC)

#artificialintelligence

Early this year, I detailed that Intel (INTC) was poised to lead the AI revolution over the coming decade. The widespread adoption of AI will contribute significantly to demand for (Intel) compute silicon, and hence, will be a growth driver for the company. Intel forecasts it will be about a $25 billion opportunity by 2025, compared to $3.8 billion revenue in 2019. On June 18, Intel launched its third-generation Xeon Scalable platform, codenamed Cooper Lake. This follows a bit over a year after the company's April 2019 data-centric portfolio launch, which included second-generation Cascade Lake, the 10nm Agilex FPGA and 800 series of 100G Ethernet adapters.


Intel Launches Stratix 10 NX FPGAs Targeting AI Workloads

#artificialintelligence

Intel today introduced its first AI-optimized FPGA โ€“ the Stratix 10 NX โ€“ which features expanded AI Tensor blocks (30 multipliers and 30 accumulators), integrated HBM memory, and high bandwidth networking. The new chip continues leveraging Intel's chiplet architecture and the FPGA portion of the chip is fabbed using Intel's 14nm technology. Intel reports the new FPGA will deliver up to 15X more INT8 compute than the Stratix 10 MX, which was introduced in late 2017 and whose DSP block only had two multipliers and two accumulators. The new chip also features "up to 57.8 Gig PAM4 transceivers and hard Intel Ethernet blocks for high efficiency." The Stratix 10 NX will be available later this year, according to Intel.


Microsoft Launches FPGA-Powered Machine Learning for Azure Customers

#artificialintelligence

At the Microsoft Build conference on Monday, the company kicked off a new cloud offering that would provide machine learning resources to cloud customers using Intel FPGA-accelerated servers. "I think this is a first step in making the FPGAs more of a general-purpose platform for customers," said Mark Russinovich, chief technical officer for Microsoft's Azure cloud computing platform. The technology is being offered as "preview," which apparently means only a limited set of capabilities and allocations are available. Also, at this point, only customers with accounts in the East US 2 region will be able to access the platform. This represents the commercialization of Microsoft's Project Brainwave, an FPGA-based machine learning platform the company developed over the past year.


Microsoft Launches FPGA-Powered Machine Learning for Azure Customers

#artificialintelligence

At the Microsoft Build conference on Monday, the company kicked off a new cloud offering that would provide machine learning resources to cloud customers using Intel FPGA-accelerated servers. "I think this is a first step in making the FPGAs more of a general-purpose platform for customers," said Mark Russinovich, chief technical officer for Microsoft's Azure cloud computing platform. The technology is being offered as "preview," which apparently means only a limited set of capabilities and allocations are available. Also, at this point, only customers with accounts in the East US 2 region will be able to access the platform. This represents the commercialization of Microsoft's Project Brainwave, an FPGA-based machine learning platform the company developed over the past year.


Intel Proposes Its Embedded Processor Graphics For Real-Time Artificial Intelligence

#artificialintelligence

I was wrong to say that Intel (INTC) doesn't need GPUs to compete with Nvidia (NVDA) on artificial intelligence/deep learning computing. Further research told me that along with FPGA (Field Programmable Field Gate Array), there's an embedded Intel Processor Graphics for deep learning inference. It's a new concept that was discussed by Intel only last May. Nvidia's GPU can be the Training Engine for deep learning computers. Intel's FPGAs and embedded Processor Graphics could be the go-to hardware accelerators for inference computing.


Microsoft unveils Brainwave, a system for running super-fast AI

#artificialintelligence

Microsoft made a splash in the world of dedicated AI hardware today when it unveiled a new system for doing high-speed, low-latency serving of machine learning models. The company showed off a new system called Brainwave that will allow developers to deploy machine learning models onto programmable silicon and achieve high performance beyond what they'd be able to get from a CPU or GPU. Researchers at the Hot Chips conference in Cupertino, California showed a Gated Recurrent Unit model running on Intel's new Stratix 10 field programmable gate array (FPGA) chip at a speed of 39.5 teraflops, without batching operations at all. The lack of batching means that it's possible for the hardware to handle requests as they come in, providing real-time insights for machine learning systems. The model that Microsoft chose is several times larger than convolutional neural networks like Alexnet and Resnet-50, which other companies have used to benchmark their own hardware.


Microsoft unveils Project Brainwave for real-time AI - Microsoft Research

#artificialintelligence

Today at Hot Chips 2017, our cross-Microsoft team unveiled a new deep learning acceleration platform, codenamed Project Brainwave. I'm delighted to share more details in this post, since Project Brainwave achieves a major leap forward in both performance and flexibility for cloud-based serving of deep learning models. We designed the system for real-time AI, which means the system processes requests as fast as it receives them, with ultra-low latency. Real-time AI is becoming increasingly important as cloud infrastructures process live data streams, whether they be search queries, videos, sensor streams, or interactions with users. First, Project Brainwave leverages the massive FPGA infrastructure that Microsoft has been deploying over the past few years.


Intel looks beyond x86, puts 64-bit ARM processor in new FPGA chip

PCWorld

It seems like the chip war between Intel and ARM is slowly winding down, at least for the time being. Intel for decades has doggedly sworn by chips based on its homegrown x86 architecture, but the company is putting a 64-bit ARM processor in its new Stratix 10 FPGA (field-programmable gate array), which was announced on Tuesday. The FPGA -- based on Altera technology -- can be reprogrammed to do a wide variety of server or network tasks. It can also run algorithms for machine learning. In a larger context, the chip points to a long-term strategy of Intel thinking beyond x86 and warming up to other architectures as it looks to shed its reliance on PCs.