Goto

Collaborating Authors

 asic


The Role of Advanced Computer Architectures in Accelerating Artificial Intelligence Workloads

Amin, Shahid, Shah, Syed Pervez Hussnain

arXiv.org Artificial Intelligence

The remarkable progress in Artificial Intelligence (AI) is foundation-ally linked to a concurrent revolution in computer architecture. As AI models, particularly Deep Neural Networks (DNNs), have grown in complexity, their massive computational demands have pushed traditional architectures to their limits. This paper provides a structured review of this co-evolution, analyzing the architectural landscape designed to accelerate modern AI workloads. We explore the dominant architectural paradigms Graphics Processing Units (GPUs), Appli-cation-Specific Integrated Circuits (ASICs), and Field-Programmable Gate Ar-rays (FPGAs) by breaking down their design philosophies, key features, and per-formance trade-offs. The core principles essential for performance and energy efficiency, including dataflow optimization, advanced memory hierarchies, spar-sity, and quantization, are analyzed. Furthermore, this paper looks ahead to emerging technologies such as Processing-in-Memory (PIM) and neuromorphic computing, which may redefine future computation. By synthesizing architec-tural principles with quantitative performance data from industry-standard benchmarks, this survey presents a comprehensive picture of the AI accelerator landscape. We conclude that AI and computer architecture are in a symbiotic relationship, where hardware-software co-design is no longer an optimization but a necessity for future progress in computing.


GenDP: A Framework of Dynamic Programming Acceleration for Genome Sequencing Analysis

Communications of the ACM

Genomics is playing an important role in transforming healthcare. Genetic data, however, is being produced at a rate that far outpaces Moore's Law. Many efforts have been made to accelerate genomics kernels on modern commodity hardware, such as CPUs and GPUs, as well as custom accelerators (ASICs) for specific genomics kernels. While ASICs provide higher performance and energy efficiency than general-purpose hardware, they incur a high hardware-design cost. Moreover, to extract the best performance, ASICs tend to have significantly different architectures for different kernels.


Embedded FPGA Developments in 130nm and 28nm CMOS for Machine Learning in Particle Detector Readout

Gonski, Julia, Gupta, Aseem, Jia, Haoyi, Kim, Hyunjoon, Rota, Lorenzo, Ruckman, Larry, Dragone, Angelo, Herbst, Ryan

arXiv.org Artificial Intelligence

Embedded field programmable gate array (eFPGA) technology allows the implementation of reconfigurable logic within the design of an application-specific integrated circuit (ASIC). This approach offers the low power and efficiency of an ASIC along with the ease of FPGA configuration, particularly beneficial for the use case of machine learning in the data pipeline of next-generation collider experiments. An open-source framework called "FABulous" was used to design eFPGAs using 130 nm and 28 nm CMOS technology nodes, which were subsequently fabricated and verified through testing. The capability of an eFPGA to act as a front-end readout chip was assessed using simulation of high energy particles passing through a silicon pixel sensor. A machine learning-based classifier, designed for reduction of sensor data at the source, was synthesized and configured onto the eFPGA. A successful proof-of-concept was demonstrated through reproduction of the expected algorithm result on the eFPGA with perfect accuracy. Further development of the eFPGA technology and its application to collider detector readout is discussed.


ASIC: Aligning Sparse in-the-wild Image Collections

Gupta, Kamal, Jampani, Varun, Esteves, Carlos, Shrivastava, Abhinav, Makadia, Ameesh, Snavely, Noah, Kar, Abhishek

arXiv.org Artificial Intelligence

The above is also true for an image of a works assume either ground-truth keypoint annotations or "never-before-seen" object (as opposed to a common object a large dataset of images of a single object category. However, category such as cars) where humans demonstrate surprisingly neither of the above assumptions hold true for the longtail robust generalization despite lacking an object or category of the objects present in the world. We present a selfsupervised specific priors [6]. These correspondences in turn inform technique that directly optimizes on a sparse collection downstream inferences about the object such as shape, of images of a particular object/object category to affordances, and more. In this work, we tackle this problem obtain consistent dense correspondences across the collection. of "low-shot dense correspondence" - i.e. given only a small We use pairwise nearest neighbors obtained from deep in-the-wild image collection ( 10-30 images) of an object features of a pre-trained vision transformer (ViT) model as or object category, we recover dense and consistent correspondences noisy and sparse keypoint matches and make them dense across the entire collection.


Demonstrating Analog Inference on the BrainScaleS-2 Mobile System

Stradmann, Yannik, Billaudelle, Sebastian, Breitwieser, Oliver, Ebert, Falk Leonard, Emmel, Arne, Husmann, Dan, Ilmberger, Joscha, Müller, Eric, Spilger, Philipp, Weis, Johannes, Schemmel, Johannes

arXiv.org Artificial Intelligence

We present the BrainScaleS-2 mobile system as a compact analog inference engine based on the BrainScaleS-2 ASIC and demonstrate its capabilities at classifying a medical electrocardiogram dataset. The analog network core of the ASIC is utilized to perform the multiply-accumulate operations of a convolutional deep neural network. At a system power consumption of 5.6 W, we measure a total energy consumption of 192 µJ for the ASIC and achieve a classification time of 276 µs per electrocardiographic patient sample. Patients with atrial fibrillation are correctly identified with a detection rate of (93.7 0.7) % at (14.0 1.0) % false positives. The system is directly applicable to edge inference applications due to its small size, power envelope, and flexible I/O capabilities. It has enabled the BrainScaleS-2 ASIC to be operated reliably outside a specialized lab setting. In future applications, the system allows for a combination of conventional machine learning layers with online learning in spiking neural networks on a single neuromorphic platform.


Artificial-intelligence hardware: New opportunities for semiconductor companies

#artificialintelligence

Software has been the star of high tech over the past few decades, and it's easy to understand why. With PCs and mobile phones, the game-changing innovations that defined this era, the architecture and software layers of the technology stack enabled several important advances. In this environment, semiconductor companies were in a difficult position. Although their innovations in chip design and fabrication enabled next-generation devices, they received only a small share of the value coming from the technology stack--about 20 to 30 percent with PCs and 10 to 20 percent with mobile. But the story for semiconductor companies could be different with the growth of artificial intelligence (AI)--typically defined as the ability of a machine to perform cognitive functions associated with human minds, such as perceiving, reasoning, and learning. Many AI applications have already gained a wide following, including virtual assistants that manage our homes and facial-recognition programs that track criminals. What will this development mean for semiconductor sales and revenues?


Deep Learning Part 3/4

#artificialintelligence

Hardware is the foundation that deep learning is based on, providing its capabilities and readiness to help people categorize objects, improve speech recognition, understand visualizations, or any other purpose motivating people to use deep learning. When analyzing deep learning computational needs, remembering acronyms is the best way to spell out the hardware requirements for deep learning. GPU, TPU, FPGA, and ASICs are all key hardware components necessary for making deep learning work, especially amid concerns in recent that its progress has stunted. These types of hardware consume a lot of power and facilitate large deep learning models that CPUs and regular laptops can't manage. How does each of these hardware types facilitate these needs while addressing the computational limits restricting deep learning from achieving maximum potential?


Nvidia Will Be A Prime Contractor For Big AI Supercomputers

#artificialintelligence

Normally, when we look at a system, we think from the compute engines at a very fine detail and then work our way out across the intricacies of the nodes and then the interconnect and software stack that scales it across the nodes into a distributed computing platform. But this time, when we are going over the many announcements that Nvidia is making at its GPU Technical Conference 2022 online event, we want to start at the middle layer where the nodes meet the network and work our way up because this is what makes Nvidia a real contender as a high performance computing system maker – meaning machines designed to run AI, HPC, data analytics workloads and not just traditional HPC simulation and modeling. In fact, we think the innovations unleashed at GTC 2022 this year are going to make Nvidia one of the key prime contractors for such systems operating at exascale and beyond. To play that game, you have to have architecture and deep pockets, and Nvidia clearly has both. With IBM basically out of the game, capability-class supercomputers are coming down to Hewlett Packard Enterprise, Nvidia, Fujitsu (the latter being pretty much focused on RIKEN Lab in Japan and a few other centers that buy chips off the "K" and "Fugaku" blocks), and Atos (which is doing a lot of business with its BullSequana systems in Europe).


Know the Edge AI Ecosystem

#artificialintelligence

Successful adoption of Edge AI requires understanding and integrating different elements in a way that this stack can be seamlessly deployed in the target environment. Implementing an Edge AI application requires an understanding of aspects like the tasks to be performed, hardware, frameworks, and models. For deep neural networks to run at the edge; hardware, frameworks, and tools need to work collectively. As edge AI applications vary according to the use case, these requirements need to be thought through for each of the scenarios. It is necessary to select proper hardware, frameworks, and tools that will be compatible with each other and the best suited for the use case. Below we discuss briefly a few of the frameworks, hardware processors, and development boards.


Council Post: The Possibilities Of AI In 2030: Transformation Across Dimensions

#artificialintelligence

By 2030, AI will likely no longer be getting adopted with simple scenarios and applications. It will be expected to detect life-threatening diseases in the nascent stage, predict weather conditions of a large area over several months and become a digital collaborator to the human race. These are just a few possibilities of the potential impact of AI on life and work in the coming years. The pace of change has been unprecedented in the sector, and it promises to continue in the same vein in the years to come. With rapid learning and adoption, AI is no longer a crystal ball technology but something that humans now interact with in nearly every sphere of life.