Pre-Defined Sparse Neural Networks with Hardware Acceleration
Dey, Sourya, Huang, Kuan-Wen, Beerel, Peter A., Chugg, Keith M.
As more data have become available, the size and complexity of neural network (NN)s has risen sharply with modern NNs containing millions or even billions of trainable parameters [1], [2]. These massive NNs come with the cost of large computational and storage demands. The current state of the art is to train large NNs on Graphical Processing Unit (GPU)s in the cloud - a process that can take days to weeks even on powerful GPUs [1]-[3] or similar programmable processorswith multiply-accumulate accelerators [4]. Once trained, the model can be used for inference which is less computationally intensive and is typically performed on more general purpose processors (i.e., Central Processing Unit (CPU)s). It is increasingly desirable to run inference, and even some retraining, on embedded processors which have limited resources for computation and storage. In this regard, model reduction has been identified as a key to NN acceleration by several prominent researchers [5]. This is generally performed post-training to reduce the memory requirements to store the model for inference - e.g., methods for quantization, compression, and grouping parameters [6]-[9]. Decreasing the time, computation, storage, and energy costs for training and inference is therefore a highly relevant goal.
Dec-3-2018
- Country:
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- North America
- Canada > Ontario
- Toronto (0.14)
- United States > California
- Los Angeles County > Los Angeles (0.28)
- Canada > Ontario
- Europe > Italy
- Genre:
- Research Report > New Finding (0.46)
- Technology: