Reparameterizing Mirror Descent as Gradient Descent
–Neural Information Processing Systems
Forthis, wefirstconsiderthe -trickon(18), inwhichwesetw(t)= w+(t) w (t) where log w+(t)= rwL(w(t)), log w (t)=+ rwL(w(t)).
Neural Information Processing Systems
Feb-8-2026, 14:55:41 GMT
- Country:
- Technology: