Robust Layerwise Scaling Rules by Proper Weight Decay Tuning

Open in new window