Implementation of a Binary Neural Network on a Passive Array of Magnetic Tunnel Junctions

Goodwill, Jonathan M., Prasad, Nitin, Hoskins, Brian D., Daniels, Matthew W., Madhavan, Advait, Wan, Lei, Santos, Tiffany S., Tran, Michael, Katine, Jordan A., Braganca, Patrick M., Stiles, Mark D., McClelland, Jabez J.

arXiv.org Artificial Intelligence 

Avenues to mitigate the main issue, the von Neumann bottleneck, include in-memory and near-memory architectures, as well as algorithmic approaches. Here we leverage the low-power and the inherently binary operation of magnetic tunnel junctions (MTJs) to demonstrate neural network hardware inference based on passive arrays of MTJs. In general, transferring a trained network model to hardware for inference is confronted by degradation in performance due to device-todevice variations, write errors, parasitic resistance, and nonidealities in the substrate. To quantify the effect of these hardware realities, we benchmark 300 unique weight matrix solutions of a 2-layer perceptron to classify the Wine dataset for both classification accuracy and write fidelity. Despite device imperfections, we achieve software-equivalent accuracy of up to 95.3 % with proper tuning of network parameters in 15 15 MTJ arrays having a range of device sizes. The success of this tuning process shows that new metrics are needed to characterize the performance and quality of networks reproduced in mixed signal hardware. I. INTRODUCTION Over the past decade, artificial intelligence algorithms have achieved human-level performance on increasingly complex tasks at the cost of increased neural network size, computing resources, and energy consumption [1-5]. OpenAI's GPT-3, for example, a state-ot-the-art natural language processor, contains 175 billion parameters and requires 3.14 10 Running these algorithms for inference applications--applications that require the model to make predictions but not learn new information--requires lesser but still overwhelming amounts of energy. This energy inefficiency is in part due to implementing these algorithms using general-purpose hardware such as central and graphical processing units (CPUs and GPUs). Because CPUs and GPUs have traditional von Neumann computing architectures, they do not store data in the same spatial location as where computation is carried out. For this reason, energy is consumed in moving the data, and the speed of computation is throttled by the time it takes to shuttle from the storage to the computation location. This so-called von Neumann bottleneck has been shown to be severe on large neural network models, with studies showing the majority of the network time and energy can be expended distributing gradient and model data [11-13]. Algorithmic approaches to lessening the data bottleneck have focused on simplifying neural network models to achieve equivalent accuracy with less memory overhead.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found