449590dfd5789cc7043f85f8bb7afa47-Supplemental-Conference.pdf
–Neural Information Processing Systems
For any t 0, we have k tk c3α2τL2maxκ2maxE 10 due to either the initialization (t = 0)orA3(t)(t > 0).
Neural Information Processing Systems
Feb-8-2026, 15:34:34 GMT