Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
–Neural Information Processing Systems
Large-scale generative models such as GPT and DALL-E have revolutionized the research community. These models not only generate high fidelity outputs, but are also generalists which can solve tasks not explicitly taught. In contrast, speech generative models are still primitive in terms of scale and task generalization.
Neural Information Processing Systems
Dec-24-2025, 09:51:55 GMT
- Technology: