Adam with model exponential moving average is effective for nonconvex optimization