Re-Weighted Softmax Cross-Entropy to Control Forgetting in Federated Learning

Legate, Gwen, Caccia, Lucas, Belilovsky, Eugene

arXiv.org Artificial Intelligence 

In Federated Learning, a global model is learned by aggregating model updates computed at a set of independent client nodes, to reduce communication costs multiple gradient steps are performed at each node prior to aggregation. A key challenge in this setting is data heterogeneity across clients resulting in differing local objectives which can lead clients to overly minimize their own local objective, diverging from the global solution. We demonstrate that individual client models experience a catastrophic forgetting with respect to data from other clients and propose an efficient approach that modifies the cross-entropy objective on a per-client basis by re-weighting the softmax logits prior to computing the loss. This approach shields classes outside a client's label set from abrupt representation change and we empirically demonstrate it can alleviate client forgetting and provide consistent improvements to standard federated learning algorithms. Our method is particularly beneficial under the most challenging federated learning settings where data heterogeneity is high and client participation in each round is low. Federated Learning (FL) is a distributed machine learning paradigm in which a shared global model is learned from a decentralized set of data located at a number of independent client nodes (McMahan et al., 2017; Konečnỳ et al., 2016). Driven by communication constraints, FL algorithms typically perform a number of local gradient update steps before synchronizing with the global model. This reduced communication strategy is very effective under independent and identically distributed (i.i.d.) settings, but data heterogeneity across clients has direct implications on the convergence and performance of FL algorithms (Zhao et al., 2018). FL was conceptualized as a learning technique to train a shared model without sharing user sensitive data, while allowing users to benefit from data stored at other nodes, such as phones and tablets of decentralized users. Under realistic settings, client data will often have non-i.i.d.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found