lip sync
Style-Preserving Lip Sync via Audio-Aware Style Reference
Zhong, Weizhi, Li, Jichang, Cai, Yinqi, Lin, Liang, Li, Guanbin
Audio-driven lip sync has recently drawn significant attention due to its widespread application in the multimedia domain. Individuals exhibit distinct lip shapes when speaking the same utterance, attributed to the unique speaking styles of individuals, posing a notable challenge for audio-driven lip sync. Earlier methods for such task often bypassed the modeling of personalized speaking styles, resulting in sub-optimal lip sync conforming to the general styles. Recent lip sync techniques attempt to guide the lip sync for arbitrary audio by aggregating information from a style reference video, yet they can not preserve the speaking styles well due to their inaccuracy in style aggregation. This work proposes an innovative audio-aware style reference scheme that effectively leverages the relationships between input audio and reference audio from style reference video to address the style-preserving audio-driven lip sync. Specifically, we first develop an advanced Transformer-based model adept at predicting lip motion corresponding to the input audio, augmented by the style information aggregated through cross-attention layers from style reference video. Afterwards, to better render the lip motion into realistic talking face video, we devise a conditional latent diffusion model, integrating lip motion through modulated convolutional layers and fusing reference facial images via spatial cross-attention layers. Extensive experiments validate the efficacy of the proposed approach in achieving precise lip sync, preserving speaking styles, and generating high-fidelity, realistic talking face videos.
A deep learning technique to generate real-time lip sync for live 2-D animation
Live 2-D animation is a fairly new and powerful form of communication that allows human performers to control cartoon characters in real time while interacting and improvising with other actors or members of an audience. Recent examples include Stephen Colbert interviewing cartoon guests on The Late Show, Homer answering live phone-in questions from viewers during a segment of The Simpsons, Archer talking to a live audience at ComicCon, and the stars of Disney's Star vs. The Forces of Evil and My Little Pony hosting live chat sessions with fans via YouTube or Facebook Live. Producing realistic and effective live 2-D animations requires the use of interactive systems that can automatically transform human performances into animations in real time. A key aspect of these systems is attaining a good lip sync, which essentially means that the mouths of animated characters move appropriately when speaking, mimicking the movements observed in the mouths of performers.
- Media > Television (0.55)
- Leisure & Entertainment (0.55)
Canny AI: Imagine world leaders singing
Deep Learning is really starting to establish itself as a major new tool in visual effects. Currently the tools are still in their infancy but they are changing the way visual effects can be approached. Instead of a pipeline consisting of modelling, texturing, lighting and rendering, these new approaches are hallucinating or plausibly creating imagery that is based on training data sets. Machine Learning, the superset of Deep Learning and similar approaches have had great success in image classification, image recognition and image synthesis. At fxguide we covered Synthesia in the UK, a company born out of research first published as Face2Face.
- Asia > South Korea (0.31)
- Europe > United Kingdom (0.25)
- North America > United States (0.15)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.05)
- Media (1.00)
- Leisure & Entertainment (0.96)
- Government > Regional Government > Asia Government (0.30)
- Government > Military > Army (0.30)
Big Mouth Billy Bass is back! New $40 version of hit toy works with Amazon's Alexa smart speaker
It is one of the most irritating toys ever made - and has been given a hi-tech makeover. The original Big Mouth Billy Bass infuriated many with its incessant flapping and singing. Now, it can lip sync to anything Alexa says, and even dance along to music. The original Big Mouth Billy Bass infuriated many with its incessant flapping and singing. Now, it can lip sync to anything Alexa says, and even dance along to music.
Big Mouth Billy Bass is back! New $40 version of hit toy works with Amazon's Alexa smart speaker
It is one of the most irritating toys ever made - and has been given a hi-tech makeover. The original Big Mouth Billy Bass infuriated many with its incessant flapping and singing. Now, it can lip sync to anything Alexa says, and even dance along to music. The original Big Mouth Billy Bass infuriated many with its incessant flapping and singing. Now, it can lip sync to anything Alexa says, and even dance along to music.
'Deep fakes': Sorting fact from fiction in the fake-Obama video era
It always starts with porn. What first revealed the internet's power to distribute information? Porn has historically been a reliable canary in the coal mine, so the "deep fakes" video Vice found in late 2017 has lawmakers paying attention. Using free machine-learning platforms, people on Reddit superimposed the face of Wonder Woman's Gal Godot on a porn actress's body in a creepy, almost-convincing sex video. Researchers use "Real-time Face Capture" on Russian President Vladimir Putin.
- North America > United States (0.99)
- Asia > Russia (0.58)