Subword models struggle with word learning, but surprisal hides it

Bunzeck, Bastian, Zarrieß, Sina

arXiv.org Artificial Intelligence 

When humans acquire first language(s), they first In contrast, subword LMs of all sizes perform learn to recognize single words before understanding much worse in a syntax-independent lexical decision the grammatical processes governing them setting and only reach comparable accuracy (Tomasello, 1992; Behrens, 2021). This simple when stimuli are measured through surprisal, "unexpectedness" fact about language acquisition has found surprisingly in syntactic contexts. By comparing little attention in the increasing amount of word and syntactic learning (measured via BLiMP, work that treats LMs as models of language learners Warstadt et al., 2020), we further find that character (Warstadt and Bowman, 2022; Portelance and models quickly acquire lexical knowledge Jasbi, 2024). While word learning in children is and only later develop syntactic knowledge. In well studied, the implicit word learning processes subword models, however, word learning happens in LMs are not. Current studies overwhelmingly focus later and concurrently with syntax learning, bringing on syntax (Mueller et al., 2022; Choshen et al., further evidence against the cognitive plausibility 2022), or investigate word learning in close connection of subword tokenization. This shows how to syntax through surprisal (Chang and Bergen, elementary decisions (like choice of tokenization) 2022; Portelance et al., 2023; Shafiabadi and Wisniewski, can tremendously influence the learning dynamics 2025; Ficarra et al., 2025). Architecturewise, and trajectories that can be observed in LMs, a fact a key limitation to the precise study of word that should receive more scrutiny in studies of LMs learning is subword tokenization (e.g.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found