Probing BERT for German Compound Semantics
Miletić, Filip, Schmid, Aaron, Walde, Sabine Schulte im
–arXiv.org Artificial Intelligence
This paper investigates the extent to which pretrained German BERT encodes knowledge of noun compound semantics. We comprehensively vary combinations of target tokens, layers, and cased vs. uncased models, and evaluate them by predicting the compositionality of 868 gold standard compounds. Looking at representational patterns within the transformer architecture, we observe trends comparable to equivalent prior work on English, with compositionality information most easily recoverable in the early layers. However, our strongest results clearly lag behind those reported for English, suggesting an inherently more difficult task in German. This may be due to the higher productivity of compounding in German than in English and the associated increase in constituent-level ambiguity, including in our target compound set.
arXiv.org Artificial Intelligence
May-21-2025
- Country:
- South America > Colombia
- Meta Department > Villavicencio (0.05)
- North America
- United States > Ohio
- Franklin County > Columbus (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States > Ohio
- Europe
- Slovenia (0.04)
- Netherlands (0.04)
- Czechia > Prague (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Germany
- Berlin (0.04)
- Baden-Württemberg > Stuttgart Region
- Stuttgart (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Asia
- Singapore (0.04)
- Thailand
- Chiang Mai > Chiang Mai (0.04)
- Bangkok > Bangkok (0.04)
- Middle East
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- UAE > Abu Dhabi Emirate
- South America > Colombia
- Genre:
- Research Report (1.00)
- Technology: