Probing BERT for German Compound Semantics
Miletić, Filip, Schmid, Aaron, Walde, Sabine Schulte im
–arXiv.org Artificial Intelligence
This paper investigates the extent to which pretrained German BERT encodes knowledge of noun compound semantics. We comprehensively vary combinations of target tokens, layers, and cased vs. uncased models, and evaluate them by predicting the compositionality of 868 gold standard compounds. Looking at representational patterns within the transformer architecture, we observe trends comparable to equivalent prior work on English, with compositionality information most easily recoverable in the early layers. However, our strongest results clearly lag behind those reported for English, suggesting an inherently more difficult task in German. This may be due to the higher productivity of compounding in German than in English and the associated increase in constituent-level ambiguity, including in our target compound set.
arXiv.org Artificial Intelligence
May-21-2025
- Country:
- Asia
- Middle East
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Republic of Türkiye > Istanbul Province
- Singapore (0.04)
- Thailand
- Bangkok > Bangkok (0.04)
- Chiang Mai > Chiang Mai (0.04)
- Middle East
- Europe
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Czechia > Prague (0.04)
- Germany
- Baden-Württemberg > Stuttgart Region
- Stuttgart (0.04)
- Berlin (0.04)
- Baden-Württemberg > Stuttgart Region
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Netherlands (0.04)
- Slovenia (0.04)
- Croatia > Dubrovnik-Neretva County
- North America
- Canada > Ontario
- Toronto (0.04)
- United States > Ohio
- Franklin County > Columbus (0.04)
- Canada > Ontario
- South America > Colombia
- Meta Department > Villavicencio (0.05)
- Asia
- Genre:
- Research Report (1.00)
- Technology: