A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules

Open in new window