Adam with model exponential moving average is effective for nonconvex optimization

Open in new window