Light-Weight Fault Tolerant Attention for Large Language Model Training