Interview with Yuki Mitsufuji: Text-to-sound generation

Jul-29-2025, 13:29:45 GMT–AIHub

Earlier this year, we spoke to Yuki Mitsufuji, Lead Research Scientist at Sony AI, about work concerning different aspects of image generation. Yuki and his team have since extended their work to sound generation, presenting work at ICLR 2025 entitled: SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation. We caught up with Yuki to find out more. Creating sounds for different types of multimedia, such as video games and movies, takes a lot of experimenting, as artists try to match sounds to their evolving creative ideas. New high-quality diffusion-based Text-to-Sound (T2S) generative models can help with this process, but they are often slow, which makes it harder for creators to experiment quickly.

artificial intelligence, machine learning, yuki mitsufuji, (11 more...)

AIHub

Jul-29-2025, 13:29:45 GMT

News Web Page

Add feedback

Genre:
- Research Report (0.73)
- Personal > Interview (0.57)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.53)
  - Vision (0.37)