Goto

Collaborating Authors

 processing unit


Why Google's custom AI chips are shaking up the tech industry

New Scientist

Why Google's custom AI chips are shaking up the tech industry Ironwood is Google's latest tensor processing unit Nvidia's position as the dominant supplier of AI chips may be under threat from a specialised chip pioneered by Google, with reports suggesting companies like Meta and Anthropic are looking to spend billions on Google's tensor processing units. The success of the artificial intelligence industry has been in large part based on graphical processing units (GPUs), a kind of computer chip that can perform many parallel calculations at the same time, rather than one after the other like the computer processing units (CPUs) that power most computers. 'Flashes of brilliance and frustration': I let an AI agent run my day GPUs were originally developed to assist with computer graphics, as the name suggests, and gaming. "If I have a lot of pixels in a space and I need to do a rotation of this to calculate a new camera view, this is an operation that can be done in parallel, for many different pixels," says Francesco Conti at the University of Bologna in Italy. This ability to do calculations in parallel happened to be useful for training and running AI models, which often use calculations involving vast grids of numbers performed at the same time, called matrix multiplication.



Inside the California 'AI factory' that showcases the contradiction at the heart of the tech race

BBC News

Google's ultra-private CEO Sundar Pichai is showing me around Googleplex, its California headquarters. A walkway runs along the length of it, passing by a giant dinosaur skeleton, a beach volleyball pitch and dozens of Googlers lunching under the hazy November sun. But it's a laboratory, hidden away at the back of the campus behind some trees, that he is most excited to show me. This is where the invention that Google believes is its secret weapon is being developed. Known as a Tensor Processing Unit (or TPU), it looks like an unassuming little chip but, says Mr Pichai, it will one day power every AI query that goes through Google.




The Role of Advanced Computer Architectures in Accelerating Artificial Intelligence Workloads

Amin, Shahid, Shah, Syed Pervez Hussnain

arXiv.org Artificial Intelligence

The remarkable progress in Artificial Intelligence (AI) is foundation-ally linked to a concurrent revolution in computer architecture. As AI models, particularly Deep Neural Networks (DNNs), have grown in complexity, their massive computational demands have pushed traditional architectures to their limits. This paper provides a structured review of this co-evolution, analyzing the architectural landscape designed to accelerate modern AI workloads. We explore the dominant architectural paradigms Graphics Processing Units (GPUs), Appli-cation-Specific Integrated Circuits (ASICs), and Field-Programmable Gate Ar-rays (FPGAs) by breaking down their design philosophies, key features, and per-formance trade-offs. The core principles essential for performance and energy efficiency, including dataflow optimization, advanced memory hierarchies, spar-sity, and quantization, are analyzed. Furthermore, this paper looks ahead to emerging technologies such as Processing-in-Memory (PIM) and neuromorphic computing, which may redefine future computation. By synthesizing architec-tural principles with quantitative performance data from industry-standard benchmarks, this survey presents a comprehensive picture of the AI accelerator landscape. We conclude that AI and computer architecture are in a symbiotic relationship, where hardware-software co-design is no longer an optimization but a necessity for future progress in computing.


Efficient Deployment of CNN Models on Multiple In-Memory Computing Units

Bougioukou, Eleni, Antonakopoulos, Theodore

arXiv.org Artificial Intelligence

Abstract--In-Memory Computing (IMC) represents a paradigm shift in deep learning acceleration by mitigating data movement bottlenecks and leveraging the inherent parallelism of memory-based computations. In this work, we exploit an IMC Emulator (IMCE) with multiple Processing Units (PUs) for investigating how the deployment of a CNN model in a multi-processing system affects its performance, in terms of processing rate and latency. For that purpose, we introduce the Load-Balance-Longest-Path (LBLP) algorithm, that dynamically assigns all CNN nodes to the available IMCE PUs, for maximizing the processing rate and minimizing latency due to efficient resources utilization. We are benchmarking LBLP against other alternative scheduling strategies for a number of CNN models and experimental results demonstrate the effectiveness of the proposed algorithm. With the rapid growth of the Internet of Things (IoT) and Cloud Computing, there is a growing need for efficient deep learning models that can operate on diverse computing platforms, ranging from resource-constrained edge devices to high-performance data centers. Among others, Convolutional Neural Networks (CNNs) have become a cornerstone of deep learning [1], driving advances in image classification, object detection, and other computer vision tasks.


Extropic Aims to Disrupt the Data Center Bonanza

WIRED

A startup hopes to challenge Nvidia, AMD, and Intel with a chip that wrangles probabilities rather than ones and zeros. Extropic claims its exotic new chip, called XTR-0, could be thousands of times more energy efficient than existing chips when scaled up. Extropic, a startup developing an exotic new kind of computer chip that handles probabilistic bits, has produced its first working hardware along with proof that more advanced systems will tackle useful tasks in artificial intelligence and scientific research. The startup's chips work in a fundamentally different way to chips from Nvidia, AMD, and others, and promise to be thousands of times more energy efficient when scaled up. With AI companies pouring billions of dollars into building datacenters, a completely new approach could offer a far less costly alternative to vast arrays of conventional chips.


An Automated Tape Laying System Employing a Uniaxial Force Control Device

Rameder, Bernhard, Gattringer, Hubert, Naderer, Ronald, Mueller, Andreas

arXiv.org Artificial Intelligence

This paper deals with the design of a cost effective automated tape laying system (ATL system) with integrated uniaxial force control to ensure the necessary compaction forces as well as with an accurate temperature control to guarantee the used tape being melted appropriate. It is crucial to control the substrate and the oncoming tape onto a specific temperature level to ensure an optimal consolidation between the different layers of the product. Therefore, it takes several process steps from the spooled tape on the coil until it is finally tacked onto the desired mold. The different modules are divided into the tape storage spool, a tape-guiding roller, a tape processing unit, a heating zone and the consolidation unit. Moreover, a special robot control concept for testing the ATL system is presented. In contrast to many other systems, with this approach, the tape laying device is spatially fixed and the shape is moved accordingly by the robot, which allows for handling of rather compact and complex shapes. The functionality of the subsystems and the taping process itself was finally approved in experimental results using a carbon fiber reinforced HDPE tape.