predefined sparsity
Enhancing Split Computing and Early Exit Applications through Predefined Sparsity
Capogrosso, Luigi, Fraccaroli, Enrico, Petrozziello, Giulio, Setti, Francesco, Chakraborty, Samarjit, Fummi, Franco, Cristani, Marco
In the past decade, Deep Neural Networks (DNNs) achieved state-of-the-art performance in a broad range of problems, spanning from object classification and action recognition to smart building and healthcare. The flexibility that makes DNNs such a pervasive technology comes at a price: the computational requirements preclude their deployment on most of the resource-constrained edge devices available today to solve real-time and real-world tasks. This paper introduces a novel approach to address this challenge by combining the concept of predefined sparsity with Split Computing (SC) and Early Exit (EE). In particular, SC aims at splitting a DNN with a part of it deployed on an edge device and the rest on a remote server. Instead, EE allows the system to stop using the remote server and rely solely on the edge device's computation if the answer is already good enough. Specifically, how to apply such a predefined sparsity to a SC and EE paradigm has never been studied. This paper studies this problem and shows how predefined sparsity significantly reduces the computational, storage, and energy burdens during the training and inference phases, regardless of the hardware platform. This makes it a valuable approach for enhancing the performance of SC and EE applications. Experimental results showcase reductions exceeding 4x in storage and computational complexity without compromising performance. The source code is available at https://github.com/intelligolabs/sparsity_sc_ee.
- Europe > Italy (0.04)
- North America > United States > North Carolina (0.04)
- Europe > Greece (0.04)
Pre-Defined Sparse Neural Networks with Hardware Acceleration
Dey, Sourya, Huang, Kuan-Wen, Beerel, Peter A., Chugg, Keith M.
As more data have become available, the size and complexity of neural network (NN)s has risen sharply with modern NNs containing millions or even billions of trainable parameters [1], [2]. These massive NNs come with the cost of large computational and storage demands. The current state of the art is to train large NNs on Graphical Processing Unit (GPU)s in the cloud - a process that can take days to weeks even on powerful GPUs [1]-[3] or similar programmable processorswith multiply-accumulate accelerators [4]. Once trained, the model can be used for inference which is less computationally intensive and is typically performed on more general purpose processors (i.e., Central Processing Unit (CPU)s). It is increasingly desirable to run inference, and even some retraining, on embedded processors which have limited resources for computation and storage. In this regard, model reduction has been identified as a key to NN acceleration by several prominent researchers [5]. This is generally performed post-training to reduce the memory requirements to store the model for inference - e.g., methods for quantization, compression, and grouping parameters [6]-[9]. Decreasing the time, computation, storage, and energy costs for training and inference is therefore a highly relevant goal.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)