No More Adam: Learning Rate Scaling at Initialization is All You Need

Open in new window