Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits