A Proof of Learning Rate Transfer under $μ$P

Open in new window