Artificial intelligence is becoming an integral feature of most distributed computing architectures. As such, AI hardware accelerators have become a principal competitive battlefront in high tech, with semiconductor manufacturers such as NVIDIA, AMD, and Intel at the forefront. In recent months, vendors of AI hardware acceleration chips have stepped up their competitive battles. One of the most recent milestones was Intel's release of its new AI-optimized Ponte Vecchio generation of graphical processing units (GPUs), which is the first of several products from a larger Xe family of GPUs that will also accelerate gaming and high-performance computing workloads. In AI hardware acceleration, NVIDIA has been the chip vendor to beat, owing to its substantial market lead in GPUs and its continued enhancements in the chips' performance, cost, efficiency, and other features.
The configurable, adaptable nature of FPGAs (Field Programmable Gate Arrays), as well system-on-chip architectures that employ them, make the technology invaluable in a multitude of applications, from AI-powered data centers to smart edge devices and the IoT. As part of its evolving strategy, Silicon Valley bellwether Xilinx has been integrating this adaptive technology into platform accelerator solutions for machine learning, as well as domain specific architecture solutions that incorporate various compute resources like ARM cores, high speed IO and even RF functions. One of the issues with modern heterogeneous compute architectures like this today, however, is that they're difficult for the average software developer to work with. You have to have a lot of hardware expertise to understand how to best utilize the various compute resources in modern systems, from CPUs to GPUs and FPGAs. However, today, Xilinx has announced a new, free, open source tool it calls Vitis.
Deep neural networks (DNNs) have been shown to outperform conventional machine learning algorithms across a wide range of applications, e.g., image recognition, object detection, robotics, and natural language processing. However, the high computational complexity of DNNs often necessitates extremely fast and efficient hardware. The problem gets worse as the size of neural networks grows exponentially. As a result, customized hardware accelerators have been developed to accelerate DNN processing without sacrificing model accuracy. However, previous accelerator design studies have not fully considered the characteristics of the target applications, which may lead to sub-optimal architecture designs. On the other hand, new DNN models have been developed for better accuracy, but their compatibility with the underlying hardware accelerator is often overlooked. In this article, we propose an application-driven framework for architectural design space exploration of DNN accelerators. This framework is based on a hardware analytical model of individual DNN operations. It models the accelerator design task as a multi-dimensional optimization problem. We demonstrate that it can be efficaciously used in application-driven accelerator architecture design. Given a target DNN, the framework can generate efficient accelerator design solutions with optimized performance and area. Furthermore, we explore the opportunity to use the framework for accelerator configuration optimization under simultaneous diverse DNN applications. The framework is also capable of improving neural network models to best fit the underlying hardware resources.
An AI accelerator is a kind of specialised hardware accelerator or computer system created to accelerate artificial intelligence apps, particularly artificial neural networks, machine learning, robotics, and other data-intensive or sensor-driven tasks. They usually have novel designs and typically focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As deep learning and artificial intelligence workloads grew in prominence in the last decade, specialised hardware units were designed or adapted from existing products to accelerate these tasks, and to have parallel high-throughput systems for workstations targeted at various applications, including neural network simulations. As of 2018, a typical AI integrated circuit chip contains billions of MOSFET transistors. Hardware acceleration has many advantages, the main being speed. Accelerators can greatly decrease the amount of time it takes to train and execute an AI model, and can also be used to execute special AI-based tasks that cannot be conducted on a CPU.
The following paper, "Simba: Scaling Deep-Learning Inference with Chiplet-Based Architecture," by Shao et al. presents a scalable deep learning accelerator architecture that tackles issues ranging from chip integration technology to workload partitioning and non-uniform latency effects on deep neural network performance. Through a hardware prototype, they present a timely study of cross-layer issues that will inform next-generation deep learning hardware, software, and neural network architectures. Chip vendors face significant challenges with the continued slowing of Moore's Law causing the time between new technology nodes to increase, sky-rocketing manufacturing costs for silicon, and the end of Dennard scaling. In the absence of device scaling, domain specialization provides an opportunity for architects to deliver more performance and greater energy efficiency. However, domain specialization is an expensive proposition for chip manufacturers.