Supplementary Material of " Designing Robust Transformers 557 using Robust Kernel Density Estimation " 558 A The Non-parametric Regression Perspective of Self-Attention 559