Understanding SGD with Exponential Moving Average: A Case Study in Linear Regression