AITopics | speechalign

Collaborating Authors

speechalign

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SpeechAlign: Aligning Speech Generation to Human Preferences

Neural Information Processing SystemsMar-20-2026, 19:10:23 GMT

Speech language models have significantly advanced in generating realistic speech, with neural codec language models standing out. However, the integration of preference optimization to align speech outputs to human preferences is often neglected. This paper addresses this gap by first analyzing the distribution gap in codec language models, highlighting how it leads to discrepancies between the training and inference phases, which negatively affects performance. Then we explore leveraging preference optimization to bridge the distribution gap. We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences. SpeechAlign involves constructing a preference codec dataset contrasting golden codec tokens against synthetic tokens, followed by preference optimization to improve the codec language model. This cycle of improvement is carried out iteratively to steadily convert weak models to strong ones. Through both subjective and objective evaluations, we show that SpeechAlign can bridge the distribution gap and facilitating continuous self-improvement of the speech language model. Moreover, SpeechAlign exhibits robust generalization capabilities and works for smaller models.

artificial intelligence, language model, proceedings, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

5a016da670821af25f151f523a2e563f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-14-2026, 05:47:01 GMT

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
Asia > China (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
(2 more...)

Add feedback

5a016da670821af25f151f523a2e563f-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 03:33:21 GMT

codec language model, dataset, language model, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
Asia > China (0.04)

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
(2 more...)

Add feedback

SpeechAlign: Aligning Speech Generation to Human Preferences

Neural Information Processing SystemsMay-27-2025, 02:17:20 GMT

artificial intelligence, language model, speechalign, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

SpeechAlign: Aligning Speech Generation to Human Preferences

Zhang, Dong, Li, Zhaowei, Li, Shimin, Zhang, Xin, Wang, Pengyu, Zhou, Yaqian, Qiu, Xipeng

arXiv.org Artificial IntelligenceApr-8-2024

Speech language models have significantly advanced in generating realistic speech, with neural codec language models standing out. However, the integration of human feedback to align speech outputs to human preferences is often neglected. This paper addresses this gap by first analyzing the distribution gap in codec language models, highlighting how it leads to discrepancies between the training and inference phases, which negatively affects performance. Then we explore leveraging learning from human feedback to bridge the distribution gap. We introduce SpeechAlign, an iterative self-improvement strategy that aligns speech language models to human preferences. SpeechAlign involves constructing a preference codec dataset contrasting golden codec tokens against synthetic tokens, followed by preference optimization to improve the codec language model. This cycle of improvement is carried out iteratively to steadily convert weak models to strong ones. Through both subjective and objective evaluations, we show that SpeechAlign can bridge the distribution gap and facilitating continuous self-improvement of the speech language model. Moreover, SpeechAlign exhibits robust generalization capabilities and works for smaller models. Code and models will be available at https://github.com/0nutation/SpeechGPT.

codec language model, human feedback, language model, (16 more...)

arXiv.org Artificial Intelligence

2404.056

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

SpeechAlign: a Framework for Speech Translation Alignment Evaluation

Alastruey, Belen, Sant, Aleix, Gállego, Gerard I., Dale, David, Costa-jussà, Marta R.

arXiv.org Artificial IntelligenceSep-20-2023

Speech-to-Speech and Speech-to-Text translation are currently dynamic areas of research. To contribute to these fields, we present SpeechAlign, a framework to evaluate the underexplored field of source-target alignment in speech models. Our framework has two core components. First, to tackle the absence of suitable evaluation datasets, we introduce the Speech Gold Alignment dataset, built upon a English-German text translation gold alignment dataset. Secondly, we introduce two novel metrics, Speech Alignment Error Rate (SAER) and Time-weighted Speech Alignment Error Rate (TW-SAER), to evaluate alignment quality in speech models. By publishing SpeechAlign we provide an accessible evaluation framework for model assessment, and we employ it to benchmark open-source Speech Translation models.

speech translation alignment evaluation, speechalign

arXiv.org Artificial Intelligence

2309.11585

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.60)

Add feedback