Interview with Yuki Mitsufuji: Text-to-sound generation
Earlier this year, we spoke to Yuki Mitsufuji, Lead Research Scientist at Sony AI, about work concerning different aspects of image generation. Yuki and his team have since extended their work to sound generation, presenting work at ICLR 2025 entitled: SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation. We caught up with Yuki to find out more. Creating sounds for different types of multimedia, such as video games and movies, takes a lot of experimenting, as artists try to match sounds to their evolving creative ideas. New high-quality diffusion-based Text-to-Sound (T2S) generative models can help with this process, but they are often slow, which makes it harder for creators to experiment quickly.
Jul-29-2025, 13:29:45 GMT
- Genre:
- Research Report (0.73)
- Personal > Interview (0.57)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (0.53)
- Vision (0.37)
- Information Technology > Artificial Intelligence