Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation

May-29-2025, 04:39:03 GMT–Neural Information Processing Systems

Non-autoregressive Transformers (NATs) are recently applied in direct speech-tospeech translation systems, which convert speech across different languages without intermediate text data. Although NATs generate high-quality outputs and offer faster inference than autoregressive models, they tend to produce incoherent and repetitive results due to complex data distribution (e.g., acoustic and linguistic variations in speech).

machine learning, natural language, translation, (20 more...)

Neural Information Processing Systems

May-29-2025, 04:39:03 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:
- Research Report > Experimental Study (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)
  - Natural Language > Machine Translation (1.00)
  - Speech > Speech Recognition (1.00)
  - Vision (1.00)