Rethinking Tokenization for Rich Morphology: The Dominance of Unigram over BPE and Morphological Alignment

Open in new window