GaussianSpeech: Audio-Driven Gaussian Avatars
Aneja, Shivangi, Sevastopolsky, Artem, Kirschstein, Tobias, Thies, Justus, Dai, Angela, Nießner, Matthias
–arXiv.org Artificial Intelligence
We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation sequences of photo-realistic, personalized 3D human head avatars from spoken audio. To capture the expressive, detailed nature of human heads, including skin furrowing and finer-scale facial movements, we propose to couple speech signal with 3D Gaussian splatting to create realistic, temporally coherent motion sequences. We propose a compact and efficient 3DGS-based avatar representation that generates expression-dependent color and leverages wrinkle- and perceptually-based losses to synthesize facial details, including wrinkles that occur with different expressions. To enable sequence modeling of 3D Gaussian splats with audio, we devise an audio-conditioned transformer model capable of extracting lip and expression features directly from audio input. Due to the absence of high-quality datasets of talking humans in correspondence with audio, we captured a new large-scale multi-view dataset of audio-visual sequences of talking humans with native English accents and diverse facial geometry. GaussianSpeech consistently achieves state-of-the-art performance with visually natural motion at real time rendering rates, while encompassing diverse facial expressions and styles.
arXiv.org Artificial Intelligence
Nov-27-2024
- Country:
- Asia > Indonesia
- Bali (0.04)
- Europe
- Germany
- Baden-Württemberg > Tübingen Region
- Tübingen (0.04)
- Bavaria > Upper Bavaria
- Munich (0.04)
- Hesse > Darmstadt Region
- Darmstadt (0.04)
- Baden-Württemberg > Tübingen Region
- Switzerland (0.04)
- Germany
- North America > United States
- Nevada (0.04)
- New York > New York County
- New York City (0.14)
- Asia > Indonesia
- Genre:
- Research Report
- New Finding (0.67)
- Promising Solution (0.48)
- Research Report
- Industry:
- Consumer Products & Services (1.00)
- Government (1.00)
- Health & Medicine > Therapeutic Area (0.67)
- Leisure & Entertainment (1.00)
- Media (1.00)
- Technology: