Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages
Wang, Shunjie, Steinert-Threlkeld, Shane
–arXiv.org Artificial Intelligence
Despite the fact that Transformers perform well in NLP tasks, recent studies suggest that self-attention is theoretically limited in learning even some regular and context-free languages. These findings motivated us to think about their implications in modeling natural language, which is hypothesized to be mildly context-sensitive. We test the Transformer's ability to learn mildly context-sensitive languages of varying complexities, and find that they generalize well to unseen in-distribution data, but their ability to extrapolate to longer strings is worse than that of LSTMs. Our analyses show that the learned self-attention patterns and representations modeled dependency relations and demonstrated counting behavior, which may have helped the models solve the languages.
arXiv.org Artificial Intelligence
Oct-19-2023
- Country:
- Asia > Middle East
- UAE (0.14)
- Europe (0.93)
- North America > United States
- California (0.14)
- Asia > Middle East
- Genre:
- Research Report
- Experimental Study (0.48)
- New Finding (0.48)
- Research Report
- Technology: