Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation
–Neural Information Processing Systems
Non-autoregressive Transformers (NATs) are recently applied in direct speech-tospeech translation systems, which convert speech across different languages without intermediate text data. Although NATs generate high-quality outputs and offer faster inference than autoregressive models, they tend to produce incoherent and repetitive results due to complex data distribution (e.g., acoustic and linguistic variations in speech).
Neural Information Processing Systems
May-29-2025, 04:39:03 GMT
- Country:
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Genre:
- Research Report > Experimental Study (0.93)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.46)
- Natural Language > Machine Translation (1.00)
- Speech > Speech Recognition (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence