Universal-2-TF: Robust All-Neural Text Formatting for ASR
Khare, Yash, Peyash, Taufiquzzaman, Vanzo, Andrea, Yoshioka, Takuya
–arXiv.org Artificial Intelligence
This paper introduces an all-neural text formatting (TF) model designed for commercial automatic speech recognition (ASR) systems, encompassing punctuation restoration (PR), truecasing, and inverse text normalization (ITN). Unlike traditional rule-based or hybrid approaches, this method leverages a two-stage neural architecture comprising a multi-objective token classifier and a sequence-to-sequence (seq2seq) model. This design minimizes computational costs and reduces hallucinations while ensuring flexibility and robustness across diverse linguistic entities and text domains. Developed as part of the Universal-2 ASR system, the proposed method demonstrates superior performance in TF accuracy, computational efficiency, and perceptual quality, as validated through comprehensive evaluations using both objective and subjective methods. This work underscores the importance of holistic TF models in enhancing ASR usability in practical settings.
arXiv.org Artificial Intelligence
Jan-10-2025
- Country:
- Asia
- South Korea > Incheon
- Incheon (0.04)
- Thailand > Phuket
- Phuket (0.04)
- Vietnam > Thái Bình Province
- Thái Bình (0.04)
- South Korea > Incheon
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Middle East > Malta (0.04)
- Switzerland (0.04)
- Ireland > Leinster
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New Jersey > Essex County
- Newark (0.04)
- Texas > Travis County
- Austin (0.04)
- Minnesota > Hennepin County
- Canada > Ontario
- Asia
- Genre:
- Research Report (0.64)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (1.00)
- Natural Language (1.00)
- Representation & Reasoning (0.88)
- Speech > Speech Recognition (0.69)
- Information Technology > Artificial Intelligence