Attention on Multiword Expressions: A Multilingual Study of BERT-based Models with Regard to Idiomaticity and Microsyntax
Zaitova, Iuliia, Hirak, Vitalii, Abdullah, Badr M., Klakow, Dietrich, Möbius, Bernd, Avgustinova, Tania
–arXiv.org Artificial Intelligence
This study analyzes the attention patterns of fine-tuned encoder-only models based on the BERT architecture (BERT-based models) towards two distinct types of Multiword Expressions (MWEs): idioms and microsyntactic units (MSUs). Idioms present challenges in semantic non-compositionality, whereas MSUs demonstrate unconventional syntactic behavior that does not conform to standard grammatical categorizations. We aim to understand whether fine-tuning BERT-based models on specific tasks influences their attention to MWEs, and how this attention differs between semantic and syntactic tasks. We examine attention scores to MWEs in both pre-trained and fine-tuned BERT-based models. We utilize monolingual models and datasets in six Indo-European languages - English, German, Dutch, Polish, Russian, and Ukrainian. Our results show that fine-tuning significantly influences how models allocate attention to MWEs. Specifically, models fine-tuned on semantic tasks tend to distribute attention to idiomatic expressions more evenly across layers. Models fine-tuned on syntactic tasks show an increase in attention to MSUs in the lower layers, corresponding with syntactic processing requirements.
arXiv.org Artificial Intelligence
May-12-2025
- Country:
- Asia > Japan
- Honshū > Kansai
- Osaka Prefecture > Osaka (0.04)
- Kyūshū & Okinawa > Kyūshū
- Miyazaki Prefecture > Miyazaki (0.04)
- Honshū > Kansai
- Europe
- Germany > Saarland (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ukraine > Kyiv Oblast
- Kyiv (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Bulgaria > Varna Province
- Varna (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- United States
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Washington > King County
- Seattle (0.04)
- Minnesota > Hennepin County
- Canada
- Asia > Japan
- Genre:
- Research Report > New Finding (1.00)
- Technology: