Stochastic Rounding for LLM Training: Theory and Practice