Goto

Collaborating Authors

 group knowledge transfer


Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge

Neural Information Processing Systems

Scaling up the convolutional neural network (CNN) size (e.g., width, depth, etc.) is known to effectively improve model accuracy. However, the large model size impedes training on resource-constrained edge devices. For instance, federated learning (FL) may place undue burden on the compute capability of edge nodes, even though there is a strong practical need for FL due to its privacy and confidentiality properties. To address the resource-constrained reality of edge devices, we reformulate FL as a group knowledge transfer training algorithm, called FedGKT. FedGKT designs a variant of the alternating minimization approach to train small CNNs on edge nodes and periodically transfer their knowledge by knowledge distillation to a large server-side CNN.


Review for NeurIPS paper: Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge

Neural Information Processing Systems

Weaknesses: The pseudocode is critical to understanding the proposed approach precisely, however notational inconsistencies make it considerably harder to understand. See comments under "clarity" below. In order for this work to be well motivated, more details need to be provided indicating the real-world scenarios for which it might be helpful, and how constraints or characteristics of those settings are addressed by the algorithm. For example, the description seems to imply all clients participate in every round, which would rule out the application to cross-device FL setting (See [A] Table 1). Similarly, it is worth clarifying whether clients need to maintain state across rounds, which is typically also not possible in cross-device settings.


Review for NeurIPS paper: Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge

Neural Information Processing Systems

The paper provides a new algorithm for Federated Learning with resource constrained edge devices. The algorithm adapts distillation based techniques (which are usually used for model compression from larger model to smaller model) to a kind of a two way knowledge transfer model that aids learning of local small neural networks on the edge devices and a larger global network on the server cloud. Methodologically the paper is novel, useful, and well written. But a few points raised by the reviewers are very pertinent and needs to be discussed in the final version 1. One key advantage & motivation for the model is stated as reduced communication - unfortunately this has not been empirically justified against FedAvg - the method has the potential for less frequent communication compared to FedAvg but this has not been validated empirically -- it would be good to have this information on the experiments reported - exchanging features over parameters is stated as an advantage, but I agree with R1&R3's concern that this may not be the case in now-standard networks on high resolution images where the per iteration communication scales as #samples x #hidden units (or features), which could be large.


Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge

Neural Information Processing Systems

Scaling up the convolutional neural network (CNN) size (e.g., width, depth, etc.) is known to effectively improve model accuracy. However, the large model size impedes training on resource-constrained edge devices. For instance, federated learning (FL) may place undue burden on the compute capability of edge nodes, even though there is a strong practical need for FL due to its privacy and confidentiality properties. To address the resource-constrained reality of edge devices, we reformulate FL as a group knowledge transfer training algorithm, called FedGKT. FedGKT designs a variant of the alternating minimization approach to train small CNNs on edge nodes and periodically transfer their knowledge by knowledge distillation to a large server-side CNN.