A Memory usage compared to 16 bit precision

Neural Information Processing Systems 

We can see that the lower the C4 validation perplexity, the more outliers are present.