Textless NLP: Generating expressive speech from raw audio

Sep-10-2021, 06:38:29 GMT–#artificialintelligence

Text-based language models such as BERT, RoBERTa, and GPT-3 have made huge strides in recent years. When given written words as input, they can generate extremely realistic text on virtually any topic. In addition, they also provide useful pretrained models that can be fine-tuned for a variety of difficult natural language processing (NLP) applications, including sentiment analysis, translation, information retrieval, inferences, and summarization, using only a few labels or examples (e.g., BART and XLM-R). There is an important limitation, however: These applications are mainly restricted to languages with very large text data sets suitable for training AI models. GSLM leverages recent breakthroughs in representation learning, allowing it to work directly from only raw audio signals, without any labels or text.

discrete unit, language model, speech, (15 more...)

#artificialintelligence

Sep-10-2021, 06:38:29 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.49)
  - Speech > Speech Recognition (0.47)
  - Natural Language
    - Large Language Model (0.34)
    - Chatbot (0.34)