Perception of Phonological Assimilation by Neural Speech Recognition Models
Pouw, Charlotte, Kloots, Marianne de Heer, Alishahi, Afra, Zuidema, Willem
–arXiv.org Artificial Intelligence
Any speech recognition system must learn to recognize the intended words regardless of the various ways in which those words may be pronounced. A substantial amount of the variability in speech is systematic, arising from phonological processes occurring in predictable environments. One such process is place assimilation, where phonemes adopt the articulation place of adjacent phonemes. For instance, the word pair clean pan is frequently pronounced as clea[m] pan, with the wordfinal coronal /n/ in clean assimilating to the subsequent labial [p] in pan. This is a simple yet common phonological process across the world's languages (Hura, Lindblom, and Diehl 1992). In English, it occurs for coronal segments (e.g., /t/, /d/, /n/) that are followed by noncoronals, such as labials (e.g., [p], [b], [m]) or velars (e.g., [k], [g], [N]). Human listeners are able to infer the underlying /n/ when exposed to assimilated inputs like clea[m] pan, allowing them to perceive the intended word clean. This phenomenon is referred to as compensation for assimilation and happens automatically-- that is, humans compensate without conscious awareness of the assimilation itself. Psycholinguistic research has used controlled stimuli to investigate the mechanism behind this process.
arXiv.org Artificial Intelligence
Jun-21-2024
- Country:
- South America > Chile
- North America
- United States (0.04)
- Canada > Ontario
- Toronto (0.04)
- Europe
- Northern Europe (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Transportation (0.46)
- Leisure & Entertainment (0.46)
- Telecommunications (0.46)
- Technology: