High Identity Consistency Strong Occlusion Robustness Rich Stylistic Diversity OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers

Jun-14-2026, 20:09:13 GMT–Neural Information Processing Systems

Lip synchronization is the task of aligning a speaker's lip movements in video with corresponding speech audio, and it is essential for creating realistic, expressive video content. However, existing methods often rely on reference frames and masked-frame inpainting, which limit their robustness to identity consistency, pose variations, facial occlusions, and stylized content. In addition, since audio signals provide weaker conditioning than visual cues, lip shape leakage from the original video will affect lip sync quality.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Jun-14-2026, 20:09:13 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Vision > Face Recognition (1.00)
  - Natural Language (1.00)
  - Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found