PolyIPA -- Multilingual Phoneme-to-Grapheme Conversion Model
–arXiv.org Artificial Intelligence
This paper presents PolyIPA, a novel multilingual phoneme-to-grapheme conversion model designed for multilingual name transliteration, onomastic research, and information retrieval. The model leverages two helper models developed for data augmentation: IPA2vec for finding soundalikes across languages, and similarIPA for handling phonetic notation variations. Evaluated on a test set that spans multiple languages and writing systems, the model achieves a mean Character Error Rate of 0.055 and a character-level BLEU score of 0.914, with particularly strong performance on languages with shallow orthographies. The implementation of beam search further improves practical utility, with top-3 candidates reducing the effective error rate by 52.7\% (to CER: 0.026), demonstrating the model's effectiveness for cross-linguistic applications.
arXiv.org Artificial Intelligence
Dec-12-2024
- Country:
- Asia > Middle East
- Saudi Arabia > Riyadh Province > Riyadh (0.04)
- Europe
- Croatia > Zagreb County
- Zagreb (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Croatia > Zagreb County
- North America > Canada
- Oceania > Australia
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.68)
- Technology: