Enhancing Neural Spoken Language Recognition: An Exploration with Multilingual Datasets
Anidjar, Or Haim, Yozevitch, Roi
–arXiv.org Artificial Intelligence
In this research, we advanced a spoken language recognition system, moving beyond traditional feature vector-based models. Our improvements focused on effectively capturing language characteristics over extended periods using a specialized pooling layer. We utilized a broad dataset range from Common-Voice, targeting ten languages across Indo-European, Semitic, and East Asian families. The major innovation involved optimizing the architecture of Time Delay Neural Networks. We introduced additional layers and restructured these networks into a funnel shape, enhancing their ability to process complex linguistic patterns. A rigorous grid search determined the optimal settings for these networks, significantly boosting their efficiency in language pattern recognition from audio samples. The model underwent extensive training, including a phase with augmented data, to refine its capabilities. The culmination of these efforts is a highly accurate system, achieving a 97\% accuracy rate in language recognition. This advancement represents a notable contribution to artificial intelligence, specifically in improving the accuracy and efficiency of language processing systems, a critical aspect in the engineering of advanced speech recognition technologies.
arXiv.org Artificial Intelligence
Jan-19-2025
- Country:
- Asia
- China > Jiangsu Province
- Nanjing (0.04)
- India > Madhya Pradesh
- Bhopal (0.04)
- Middle East > Israel
- Northern District > Golan Heights (0.04)
- China > Jiangsu Province
- North America > United States (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Health & Medicine (1.00)
- Technology: