fedntd
- North America > United States > Virginia (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Israel (0.04)
- Education (1.00)
- Information Technology > Security & Privacy (0.68)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Virginia (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Information Technology > Security & Privacy (1.00)
- Education (0.93)
AT able of Notations Table 3: Table of Notations throughout the paper.Indices: c,c 1 index for classes (c P t 1,, C u " r C s) i index for data (i P t 1,,N u " r N s) k,k
The softened softmax probability calculated without true-class logit on the server/client modelClass Distribution on Datasets: p " r p For the federated learning situation, we calculate this measure on the server model. Here we provide details of our experimental setups. Multi-GPU training is not conducted in the paper experiments. The details about each datasets and setups are described in Table 4. CIFAR-100, we add Cutout [12] augmentation. Details datasets setups used in the experiment. We use a momentum SGD optimizer with an initial learning rate of 0.01, and the momentum is set as The learning rate is decayed with a factor of 0.99 at each round, and In the motivational experiment in Section 3, we fix the learning rate as 0.01. Since we assume a synchronized federated learning scenario, parallel distributed learning is simulated by sequentially training the sampled clients and then aggregating them as a global model. For the implemented algorithms, we search hyperparameters and choose the best among the candidates. The hyperparameters for each algorithm is in Table 5. Sharding strategy, and the size of local datasets is identical. The conceptual illustration of federated distillation methods is in Figure 9. On the other hand, our proposed FedNTD does not have such constraints (Figure 9c). Additional resource requirements compared to FedAvg.Method No Additional Requirements on: Statefulness? We extend the motivational experiment in Section 3.1 to the main experimental setups. The value in the parenthesis is the forgetting F . The value in the parenthesis is the forgetting F . The value in the parenthesis is the forgetting F . The value in the parenthesis is the forgetting F . We report an additional experiment on popular architecture, ResNet-10. ResNet-10 is about 10x larger than the 2-conv + 2-fc model for the main experiments. The result is plotted in Figure 11. Here we investigate the personalized performance of our FedNTD. The results are in Table 13 and Figure 13 shows their corresponding learning curves. FedNTD consistently improves the performance even in such cases.Figure 13: Learning curves that corresponds to Table 13. The introduced loss term of FedAlign aims to seek out-of-distribution generality w.r.t. Figure 14: Loss space of learned model (Client 16 / LDA α " 0. 5).
- North America > United States > Virginia (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Israel (0.04)
- Education (1.00)
- Information Technology > Security & Privacy (0.68)
HYDRA-FL: Hybrid Knowledge Distillation for Robust and Accurate Federated Learning
Khan, Momin Ahmad, Chandio, Yasra, Anwar, Fatima Muhammad
Data heterogeneity among Federated Learning (FL) users poses a significant challenge, resulting in reduced global model performance. The community has designed various techniques to tackle this issue, among which Knowledge Distillation (KD)-based techniques are common. While these techniques effectively improve performance under high heterogeneity, they inadvertently cause higher accuracy degradation under model poisoning attacks (known as attack amplification). This paper presents a case study to reveal this critical vulnerability in KD-based FL systems. We show why KD causes this issue through empirical evidence and use it as motivation to design a hybrid distillation technique. We introduce a novel algorithm, Hybrid Knowledge Distillation for Robust and Accurate FL (HYDRA-FL), which reduces the impact of attacks in attack scenarios by offloading some of the KD loss to a shallow layer via an auxiliary classifier. We model HYDRA-FL as a generic framework and adapt it to two KD-based FL algorithms, FedNTD and MOON. Using these two as case studies, we demonstrate that our technique outperforms baselines in attack settings while maintaining comparable performance in benign settings.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Virginia (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)