Designing Robust Transformers using Robust Kernel Density Estimation

Jan-19-2025, 18:13:28 GMT–Neural Information Processing Systems

Transformer-based architectures have recently exhibited remarkable successes across different domains beyond just powering large language models. However, existing approaches typically focus on predictive accuracy and computational cost, largely ignoring certain other practical issues such as robustness to contaminated samples. In this paper, by re-interpreting the self-attention mechanism as a non-parametric kernel density estimator, we adapt classical robust kernel density estimation methods to develop novel classes of transformers that are resistant to adversarial attacks and data contamination. We first propose methods that down-weight outliers in RKHS when computing the self-attention operations. We empirically show that these methods produce improved performance over existing state-of-the-art methods, particularly on image data under adversarial attacks.

designing robust transformer, robust kernel density estimation, robustness, (1 more...)

Neural Information Processing Systems

Jan-19-2025, 18:13:28 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning > Uncertainty (0.65)
  - Natural Language > Large Language Model (0.64)