Supplementary Material of " Designing Robust Transformers 557 using Robust Kernel Density Estimation " 558 A The Non-parametric Regression Perspective of Self-Attention 559

Oct-10-2025, 23:37:48 GMT–Neural Information Processing Systems

Proposition 1. Assume the robust loss function is non-decreasing in [0, 1 ], (0) = 0 and The proof of Proposition 1 is mainly adapted from the proof in Kim & Scott ( 2012). For any given function: R! We first introduce a few notations that are useful for stating this result. B> (2 +) |O| where is the failure probability. By adapting Lemma 1 in Nguyen et al. ( 2022c) to uniform concentration bound, ImageNet We use the full ImageNet dataset that contains 1 .

artificial intelligence, loss function, machine learning, (13 more...)

Neural Information Processing Systems

Oct-10-2025, 23:37:48 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (0.65)
  - Representation & Reasoning > Uncertainty (0.41)

Duplicate Docs Excel Report

Title
a766f56d2da42cae20b5652970ec04ef-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found