Understanding and Minimising Outlier Features in Transformer Training