Goto

Collaborating Authors

 Tsui, Chi-Ying


Partial Knowledge Distillation for Alleviating the Inherent Inter-Class Discrepancy in Federated Learning

arXiv.org Artificial Intelligence

Substantial efforts have been devoted to alleviating the impact of the long-tailed class distribution in federated learning. In this work, we observe an interesting phenomenon that weak classes consistently exist even for class-balanced learning. These weak classes, different from the minority classes in the previous works, are inherent to data and remain fairly consistent for various network structures and learning paradigms. The inherent inter-class accuracy discrepancy can reach over 36.9% for federated learning on the FashionMNIST and CIFAR-10 datasets, even when the class distribution is balanced both globally and locally. In this study, we empirically analyze the potential reason for this phenomenon. Furthermore, a class-specific partial knowledge distillation method is proposed to improve the model's classification accuracy for weak classes. In this approach, knowledge transfer is initiated upon the occurrence of specific misclassifications within certain weak classes. Experimental results show that the accuracy of weak classes can be improved by 10.7%, reducing the inherent interclass discrepancy effectively.


How Robust is Federated Learning to Communication Error? A Comparison Study Between Uplink and Downlink Channels

arXiv.org Artificial Intelligence

Because of its privacy-preserving capability, federated learning (FL) has attracted significant attention from both academia and industry. However, when being implemented over wireless networks, it is not clear how much communication error can be tolerated by FL. This paper investigates the robustness of FL to the uplink and downlink communication error. Our theoretical analysis reveals that the robustness depends on two critical parameters, namely the number of clients and the numerical range of model parameters. It is also shown that the uplink communication in FL can tolerate a higher bit error rate (BER) than downlink communication, and this difference is quantified by a proposed formula. The findings and theoretical analyses are further validated by extensive experiments.


Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation

arXiv.org Artificial Intelligence

The unstructured sparsity after pruning poses a challenge to the efficient implementation of deep learning models in existing regular architectures like systolic arrays. On the other hand, coarse-grained structured pruning is suitable for implementation in regular architectures but tends to have higher accuracy loss than unstructured pruning when the pruned models are of the same size. In this work, we propose a model compression method based on a novel weight permutation scheme to fully exploit the fine-grained weight sparsity in the hardware design. Through permutation, the optimal arrangement of the weight matrix is obtained, and the sparse weight matrix is further compressed to a small and dense format to make full use of the hardware resources. Two pruning granularities are explored. In addition to the unstructured weight pruning, we also propose a more fine-grained subword-level pruning to further improve the compression performance. Compared to the state-of-the-art works, the matrix compression rate is significantly improved from 5.88x to 14.13x. As a result, the throughput and energy efficiency are improved by 2.75 and 1.86 times, respectively.


A Reconfigurable Winograd CNN Accelerator with Nesting Decomposition Algorithm for Computing Convolution with Large Filters

arXiv.org Artificial Intelligence

Abstract--Recent literature found that convolutional filters into a fractional number field, which is done by neural networks (CNN) with large filters perform well in multiplying the feature maps and filters with some fixed some applications such as image semantic segmentation. These matrices are derived from a Vandermonde matrix, of which the value of Winograd transformation helps to reduce the number of entry numbers grow exponentially with the matrix size. Thus, multiplications in a convolution but suffers from multiplying the data with a large number may make the numerical instability when the convolution filter size gets computation overflow, and dividing the data with a large large. This work proposes a nested Winograd algorithm number makes the computation suffer from quantization error. Compared with the state-of-art OLA-Winograd algorithm, the proposed algorithm Compared with FFT, the Winograd algorithm appears to reduces the multiplications by 1.41 to 3.29 times for be more popular in recent CNN accelerators since it normally computing 5 5 to 9 9 convolutions.