COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training

Open in new window