CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing

Rehman, Abdul, Zhang, Jian-Jun, Yang, Xiaosong

Aug-22-2025–arXiv.org Artificial Intelligence

Universal phoneme recognition typically requires analyzing long speech segments and language-specific patterns. Many speech processing tasks require pure phoneme representations free from contextual influence, which motivated our development of CUPE - a lightweight model that captures key phoneme features in just 120 milliseconds, about one phoneme's length. CUPE processes short, fixed-width windows independently and, despite fewer parameters than current approaches, achieves competitive cross-lingual performance by learning fundamental acoustic patterns common to all languages. Our extensive evaluation through supervised and self-supervised training on diverse languages, including zero-shot tests on the UCLA Phonetic Corpus, demonstrates strong cross-lingual generalization and reveals that effective universal speech processing is possible through modeling basic acoustic patterns within phoneme-length windows.

machine learning, natural language, recognition, (16 more...)

arXiv.org Artificial Intelligence

Aug-22-2025

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom (0.14)

Genre:
- Research Report > New Finding (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Speech (1.00)
  - Natural Language (0.88)
  - Machine Learning > Neural Networks (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found