Clarifying orthography: Orthographic transparency as compressibility
Torres, Charles J., Futrell, Richard
–arXiv.org Artificial Intelligence
Orthographic transparency -- how directly spelling is related to sound -- lacks a unified, script-agnostic metric. Using ideas from algorithmic information theory, we quantify orthographic transparency in terms of the mutual compressibility between orthographic and phonological strings. Our measure provides a principled way to combine two factors that decrease orthographic transparency, capturing both irregular spellings and rule complexity in one quantity. We estimate our transparency measure using prequential code-lengths derived from neural sequence models. Evaluating 22 languages across a broad range of script types (alphabetic, abjad, abugida, syllabic, logographic) confirms common intuitions about relative transparency of scripts. Mutual compressibility offers a simple, principled, and general yardstick for orthographic transparency.
arXiv.org Artificial Intelligence
May-21-2025
- Country:
- Europe
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Germany > Saxony
- Leipzig (0.04)
- France > Provence-Alpes-Côte d'Azur
- North America > United States
- California
- Orange County > Irvine (0.04)
- Santa Clara County > Mountain View (0.04)
- New Jersey > Hudson County
- Hoboken (0.04)
- California
- Europe
- Genre:
- Research Report > New Finding (0.68)
- Technology: