Dynamic Stashing Quantization for Efficient Transformer Training

Open in new window