MeCeFO: Enhancing LLM Training Robustness via Fault-Tolerant Optimization

Open in new window