Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?