Pre-Defined Sparse Neural Networks with Hardware Acceleration

Dey, Sourya, Huang, Kuan-Wen, Beerel, Peter A., Chugg, Keith M.

arXiv.org Machine Learning 

As more data have become available, the size and complexity of neural network (NN)s has risen sharply with modern NNs containing millions or even billions of trainable parameters [1], [2]. These massive NNs come with the cost of large computational and storage demands. The current state of the art is to train large NNs on Graphical Processing Unit (GPU)s in the cloud - a process that can take days to weeks even on powerful GPUs [1]-[3] or similar programmable processorswith multiply-accumulate accelerators [4]. Once trained, the model can be used for inference which is less computationally intensive and is typically performed on more general purpose processors (i.e., Central Processing Unit (CPU)s). It is increasingly desirable to run inference, and even some retraining, on embedded processors which have limited resources for computation and storage. In this regard, model reduction has been identified as a key to NN acceleration by several prominent researchers [5]. This is generally performed post-training to reduce the memory requirements to store the model for inference - e.g., methods for quantization, compression, and grouping parameters [6]-[9]. Decreasing the time, computation, storage, and energy costs for training and inference is therefore a highly relevant goal.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found