MeCeFO: Enhancing LLMTraining Robustness via Fault-Tolerant Optimization

Open in new window