Light-Weight Fault Tolerant Attention for Large Language Model Training

Open in new window