Goto

Collaborating Authors

 acceleration scheme





AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

arXiv.org Artificial Intelligence

It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communication overhead caused by the large amount of data transmitted from one device to another during training, as well as the sub-optimal partition point due to the inaccurate latency prediction of computation at each edge device can significantly slow down training. In this paper, we propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training. In particular, we propose a light-weight adaptive latency predictor to accurately estimate the computation latency of each layer at different devices, which also adapts to unseen devices through continuous learning. Therefore, the proposed latency predictor leads to better model partitioning which balances the computation loads across participating devices. Moreover, we propose a bit-level computation-efficient data compression scheme to compress the data to be transmitted between devices during training. Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster in the considered experimental settings.


An Acceleration Scheme to The Local Directional Pattern

arXiv.org Artificial Intelligence

This study seeks to improve the running time of the Local Directional Pattern (LDP) during feature extraction using a newly proposed acceleration scheme to LDP. LDP is considered to be computationally expensive. To confirm this, Shabat and Tapamo compared the running time of the LDP to gray level co-occurrence matrix (GLCM) were it was established that the running time for LDP was two orders of magnitude higher than that of the GLCM. In this study, the performance of the newly proposed acceleration scheme was evaluated against LDP and LBP using images from the publicly available extended Cohn-Kanade (CK) dataset. Based on our findings, the proposed acceleration scheme significantly improves the running time of the LDP by almost 3 times during feature extraction.