outlier
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (5 more...)
- Europe > France (0.04)
- North America > United States > Illinois (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Russia (0.04)
- (3 more...)
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.72)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Virginia (0.04)
- Asia > Middle East > Israel (0.04)
- North America > Canada (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
Designing Robust Transformers using Robust Kernel Density Estimation
Transformer-based architectures have recently exhibited remarkable successes across different domains beyond just powering large language models. However, existing approaches typically focus on predictive accuracy and computational cost, largely ignoring certain other practical issues such as robustness to contaminated samples. In this paper, by re-interpreting the self-attention mechanism as a non-parametric kernel density estimator, we adapt classical robust kernel density estimation methods to develop novel classes of transformers that are resistant to adversarial attacks and data contamination. We first propose methods that down-weight outliers in RKHS when computing the self-attention operations. We empirically show that these methods produce improved performance over existing state-of-the-art methods, particularly on image data under adversarial attacks. Then we leverage the median-of-means principle to obtain another efficient approach that results in noticeably enhanced performance and robustness on language modeling and time series classification tasks. Our methods can be combined with existing transformers to augment their robust properties, thus promising to impact a wide variety of applications.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Texas > Travis County > Austin (0.04)
- (2 more...)
- Information Technology > Security & Privacy (0.68)
- Government > Military (0.54)