SpeechAlign: Aligning Speech Generation to Human Preferences

May-27-2025, 02:17:20 GMT–Neural Information Processing Systems

Speech language models have significantly advanced in generating realistic speech, with neural codec language models standing out. However, the integration of preference optimization to align speech outputs to human preferences is often neglected. This paper addresses this gap by first analyzing the distribution gap in codec language models, highlighting how it leads to discrepancies between the training and inference phases, which negatively affects performance. Then we explore leveraging preference optimization to bridge the distribution gap. We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences.

artificial intelligence, language model, speechalign, (9 more...)

Neural Information Processing Systems

May-27-2025, 02:17:20 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (1.00)