Understanding and Minimising Outlier Features in Transformer Training Bobby He