Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers