Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent Zhiyuan Li

Neural Information Processing Systems 

As part of the effort to understand implicit bias of gradient descent in over-parametrized models, several results have shown how the training trajectory on the overparametrized model can be understood as mirror descent on a different objective. The main result here is a characterization of this phenomenon under a notion termed commuting parametrization, which encompasses all the previous results in this setting.