Understanding and Minimising Outlier Features in Neural Network Training