Quartet: Native FP4 Training Can Be Optimal for Large Language Models

Neural Information Processing Systems 

Training large language models (LLMs) models directly in low-precision offers a way to address computational costs by improving both throughput and energy efficiency.