A Transformer with Stack Attention
Li, Jiaoda, White, Jennifer C., Sachan, Mrinmaya, Cotterell, Ryan
–arXiv.org Artificial Intelligence
Natural languages are believed to be (mildly) context-sensitive. Despite underpinning remarkably capable large language models, transformers are unable to model many context-free language tasks. In an attempt to address this limitation in the modeling power of transformer-based language models, we propose augmenting them with a differentiable, stack-based attention mechanism. Our stack-based attention mechanism can be incorporated into any transformer-based language model and adds a level of interpretability to the model. We show that the addition of our stack-based attention mechanism enables the transformer to model some, but not all, deterministic context-free languages.
arXiv.org Artificial Intelligence
May-13-2024
- Country:
- Asia
- Middle East > Jordan (0.04)
- Singapore (0.04)
- Europe
- Netherlands (0.04)
- Switzerland > Zürich
- Zürich (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California > San Diego County
- San Diego (0.04)
- New Jersey (0.04)
- New York > New York County
- New York City (0.04)
- Rhode Island > Providence County
- Providence (0.04)
- Texas > Travis County
- Austin (0.04)
- California > San Diego County
- Asia
- Genre:
- Research Report (1.00)
- Technology: