Supplementary Material of " Designing Robust Transformers 557 using Robust Kernel Density Estimation " 558 A The Non-parametric Regression Perspective of Self-Attention 559
–Neural Information Processing Systems
Proposition 1. Assume the robust loss function is non-decreasing in [0, 1 ], (0) = 0 and The proof of Proposition 1 is mainly adapted from the proof in Kim & Scott ( 2012). For any given function: R! We first introduce a few notations that are useful for stating this result. B> (2 +) |O| where is the failure probability. By adapting Lemma 1 in Nguyen et al. ( 2022c) to uniform concentration bound, ImageNet We use the full ImageNet dataset that contains 1 .
Neural Information Processing Systems
Oct-10-2025, 23:37:48 GMT
- Technology: