Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale Matthew Le Bowen Shi Brian Karrer
–Neural Information Processing Systems
Large-scale generative models such as GPT and DALL-E have revolutionized the research community. These models not only generate high fidelity outputs, but are also generalists which can solve tasks not explicitly taught. In contrast, speech generative models are still primitive in terms of scale and task generalization.
Neural Information Processing Systems
Jun-2-2025, 11:48:20 GMT
- Country:
- North America (0.14)
- Genre:
- Research Report > New Finding (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.87)
- Natural Language (1.00)
- Speech > Speech Recognition (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence