A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness

Zhang, Yuhao, Albarghouthi, Aws, D'Antoni, Loris

May-27-2024–arXiv.org Artificial Intelligence

This paper reveals a key insight that a one-layer decoder-only Transformer is equivalent to a two-layer Recurrent Neural Network (RNN). Building on this insight, we propose ARC-Tran, a novel approach for verifying the robustness of decoder-only Transformers against arbitrary perturbation spaces. Compared to ARC-Tran, current robustness verification techniques are limited either to specific and length-preserving perturbations like word substitutions or to recursive models like LSTMs. ARC-Tran addresses these limitations by meticulously managing position encoding to prevent mismatches and by utilizing our key insight to achieve precise and scalable verification. Our evaluation shows that ARC-Tran (1) trains models more robust to arbitrary perturbation spaces than those produced by existing techniques and (2) shows high certification accuracy of the resulting models.

artificial intelligence, machine learning, transformer, (17 more...)

arXiv.org Artificial Intelligence

May-27-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.14)
  - Minnesota > Hennepin County
    - Minneapolis (0.14)
  - Wisconsin (0.14)

Genre:
- Overview (0.48)
- Research Report (0.70)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found