Goto

Collaborating Authors

 Jiang, Jingbo


Partial Knowledge Distillation for Alleviating the Inherent Inter-Class Discrepancy in Federated Learning

arXiv.org Artificial Intelligence

Substantial efforts have been devoted to alleviating the impact of the long-tailed class distribution in federated learning. In this work, we observe an interesting phenomenon that weak classes consistently exist even for class-balanced learning. These weak classes, different from the minority classes in the previous works, are inherent to data and remain fairly consistent for various network structures and learning paradigms. The inherent inter-class accuracy discrepancy can reach over 36.9% for federated learning on the FashionMNIST and CIFAR-10 datasets, even when the class distribution is balanced both globally and locally. In this study, we empirically analyze the potential reason for this phenomenon. Furthermore, a class-specific partial knowledge distillation method is proposed to improve the model's classification accuracy for weak classes. In this approach, knowledge transfer is initiated upon the occurrence of specific misclassifications within certain weak classes. Experimental results show that the accuracy of weak classes can be improved by 10.7%, reducing the inherent interclass discrepancy effectively.


Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation

arXiv.org Artificial Intelligence

The unstructured sparsity after pruning poses a challenge to the efficient implementation of deep learning models in existing regular architectures like systolic arrays. On the other hand, coarse-grained structured pruning is suitable for implementation in regular architectures but tends to have higher accuracy loss than unstructured pruning when the pruned models are of the same size. In this work, we propose a model compression method based on a novel weight permutation scheme to fully exploit the fine-grained weight sparsity in the hardware design. Through permutation, the optimal arrangement of the weight matrix is obtained, and the sparse weight matrix is further compressed to a small and dense format to make full use of the hardware resources. Two pruning granularities are explored. In addition to the unstructured weight pruning, we also propose a more fine-grained subword-level pruning to further improve the compression performance. Compared to the state-of-the-art works, the matrix compression rate is significantly improved from 5.88x to 14.13x. As a result, the throughput and energy efficiency are improved by 2.75 and 1.86 times, respectively.


A Reconfigurable Winograd CNN Accelerator with Nesting Decomposition Algorithm for Computing Convolution with Large Filters

arXiv.org Artificial Intelligence

Abstract--Recent literature found that convolutional filters into a fractional number field, which is done by neural networks (CNN) with large filters perform well in multiplying the feature maps and filters with some fixed some applications such as image semantic segmentation. These matrices are derived from a Vandermonde matrix, of which the value of Winograd transformation helps to reduce the number of entry numbers grow exponentially with the matrix size. Thus, multiplications in a convolution but suffers from multiplying the data with a large number may make the numerical instability when the convolution filter size gets computation overflow, and dividing the data with a large large. This work proposes a nested Winograd algorithm number makes the computation suffer from quantization error. Compared with the state-of-art OLA-Winograd algorithm, the proposed algorithm Compared with FFT, the Winograd algorithm appears to reduces the multiplications by 1.41 to 3.29 times for be more popular in recent CNN accelerators since it normally computing 5 5 to 9 9 convolutions.