Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning

Open in new window