Researchers from the Max Planck Institute for Intelligent Systems, a member of NVIDIA's NVAIL program, developed an end-to-end deep learning algorithm that can take any speech signal as input – and realistically animate it in a wide range of adult faces. "There is an extensive literature on estimating 3D face shape, facial expressions, and facial motion from images and videos. Less attention has been paid to estimating 3D properties of faces from sound," the researchers stated in their paper. "Understanding the correlation between speech and facial motion thus provides additional valuable information for analyzing humans, particularly if visual data are noisy, missing, or ambiguous." The team first collected a new dataset of 4D face scans together with speech.
Jun-25-2019, 21:16:14 GMT