Clarifying orthography: Orthographic transparency as compressibility
Torres, Charles J., Futrell, Richard
–arXiv.org Artificial Intelligence
Orthographic transparency -- how directly spelling is related to sound -- lacks a unified, script-agnostic metric. Using ideas from algorithmic information theory, we quantify orthographic transparency in terms of the mutual compressibility between orthographic and phonological strings. Our measure provides a principled way to combine two factors that decrease orthographic transparency, capturing both irregular spellings and rule complexity in one quantity. We estimate our transparency measure using prequential code-lengths derived from neural sequence models. Evaluating 22 languages across a broad range of script types (alphabetic, abjad, abugida, syllabic, logographic) confirms common intuitions about relative transparency of scripts. Mutual compressibility offers a simple, principled, and general yardstick for orthographic transparency.
arXiv.org Artificial Intelligence
May-21-2025
- Country:
- Europe (0.28)
- North America > United States
- California (0.28)
- Genre:
- Research Report > New Finding (0.68)
- Technology: