Towards a Theoretical Understanding of Batch Normalization