Comparing Discrete and Continuous Space LLMs for Speech Recognition

Xu, Yaoxun, Zhang, Shi-Xiong, Yu, Jianwei, Wu, Zhiyong, Yu, Dong

Sep-1-2024–arXiv.org Artificial Intelligence

This paper investigates discrete and continuous speech representations in Large Language Model (LLM)-based Automatic Speech Recognition (ASR), organizing them by feature continuity and training approach into four categories: supervised and unsupervised for both discrete and continuous types. We further classify LLMs based on their input and autoregressive feedback into continuous and discrete-space models. Using specialized encoders and comparative analysis with a Joint-Training-From-Scratch Language Model (JTFS LM) and pre-trained LLaMA2-7b, we provide a detailed examination of their effectiveness. Our work marks the first extensive comparison of speech representations in LLM-based ASR and explores various modeling techniques. We present an open-sourced achievement of a state-of-the-art Word Error Rate (WER) of 1.69\% on LibriSpeech using a HuBERT encoder, offering valuable insights for advancing ASR and natural language processing (NLP) research.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Sep-1-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Guangdong Province > Shenzhen (0.05)
  - Hong Kong (0.04)

Genre:
- Research Report (0.90)
- Overview (0.54)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found