AITopics | deep neural network accelerator

Collaborating Authors

deep neural network accelerator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators

Lübeck, Konstantin, Jung, Alexander Louis-Ferdinand, Wedlich, Felix, Müller, Mika Markus, Peccia, Federico Nicolás, Thömmes, Felix, Steinmetz, Jannik, Biermaier, Valentin, Frischknecht, Adrian, Bernardo, Paul Palomero, Bringmann, Oliver

arXiv.org Artificial IntelligenceSep-13-2024

Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a challenging task that requires tailored hardware accelerator architectures and a clear understanding of their performance characteristics when executing the intended AI workload. To facilitate this, we present an automated generation approach for fast performance models to accurately estimate the latency of a DNN mapped onto systematically modeled and concisely described accelerator architectures. Using our accelerator architecture description method, we modeled representative DNN accelerators such as Gemmini, UltraTrail, Plasticine-derived, and a parameterizable systolic array. Together with DNN mappings for those modeled architectures, we perform a combined DNN/hardware dependency graph analysis, which enables us, in the best case, to evaluate only 154 loop kernel iterations to estimate the performance for 4.19 billion instructions achieving a significant speedup. We outperform regression and analytical models in terms of mean absolute percentage error (MAPE) compared to simulation results, while being several magnitudes faster than an RTL simulation.

automatic generation, deep neural network accelerator, fast and accurate performance model

arXiv.org Artificial Intelligence

2409.08595

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Monitor Placement for Fault Localization in Deep Neural Network Accelerators

Liu, Wei-Kai, Tan, Benjamin, Chakrabarty, Krishnendu

arXiv.org Artificial IntelligenceNov-28-2023

Systolic arrays are a prominent choice for deep neural network (DNN) accelerators because they offer parallelism and efficient data reuse. Improving the reliability of DNN accelerators is crucial as hardware faults can degrade the accuracy of DNN inferencing. Systolic arrays make use of a large number of processing elements (PEs) for parallel processing, but when one PE is faulty, the error propagates and affects the outcomes of downstream PEs. Due to the large number of PEs, the cost associated with implementing hardware-based runtime monitoring of every single PE is infeasible. We present a solution to optimize the placement of hardware monitors within systolic arrays. We first prove that $2N-1$ monitors are needed to localize a single faulty PE and we also derive the monitor placement. We show that a second placement optimization problem, which minimizes the set of candidate faulty PEs for a given number of monitors, is NP-hard. Therefore, we propose a heuristic approach to balance the reliability and hardware resource utilization in DNN accelerators when number of monitors is limited. Experimental evaluation shows that to localize a single faulty PE, an area overhead of only 0.33% is incurred for a $256\times 256$ systolic array.

deep neural network accelerator, fault localization, monitor placement

arXiv.org Artificial Intelligence

2311.16594

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback