h-softmax
Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition
Yang, Zhengdong, Liu, Qianying, Li, Sheng, Cheng, Fei, Chu, Chenhui
We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses the limitations of the previous Huffman-based H-Softmax method, which relied on shallow features in token similarity assessments. Through experiments on a downsampled dataset of 15 languages, we demonstrate the effectiveness of our approach in improving low-resource multilingual ASR accuracy.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (9 more...)
- Research Report > New Finding (0.93)
- Research Report > Promising Solution (0.66)
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition
Liu, Qianying, Gong, Zhuo, Yang, Zhengdong, Yang, Yuhang, Li, Sheng, Ding, Chenchen, Minematsu, Nobuaki, Huang, Hao, Cheng, Fei, Chu, Chenhui, Kurohashi, Sadao
Low-resource speech recognition has been long-suffering from insufficient training data. In this paper, we propose an approach that leverages neighboring languages to improve low-resource scenario performance, founded on the hypothesis that similar linguistic units in neighboring languages exhibit comparable term frequency distributions, which enables us to construct a Huffman tree for performing multilingual hierarchical Softmax decoding. This hierarchical structure enables cross-lingual knowledge sharing among similar tokens, thereby enhancing low-resource training outcomes. Empirical analyses demonstrate that our method is effective in improving the accuracy and efficiency of low-resource speech recognition.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
- Europe > Portugal (0.04)
- (6 more...)