AITopics | lip movement

Collaborating Authors

lip movement

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes

Neural Information Processing SystemsFeb-17-2026, 05:04:04 GMT

Our preliminary experiments have shown that the effectiveness of the existing methods often drastically decrease or even fail when tackling lip-syncing videos.

detection, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > China > Hubei Province > Wuhan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

36d4b4c04d07974fe7d455d62783ac22-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-11-2026, 02:47:28 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Europe > Russia > Central Federal District > Kursk Oblast > Kursk (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
(3 more...)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications (0.93)
(2 more...)

Add feedback

16437d40c29a1a7b1e78143c9c38f289-Paper.pdf

Neural Information Processing SystemsFeb-7-2026, 15:14:47 GMT

arxiv preprint arxiv, speech, video, (11 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Lip to Speech Synthesis with Visual Context Attentional GAN

Neural Information Processing SystemsDec-23-2025, 19:41:20 GMT

In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis. Specifically, the proposed VCA-GAN synthesizes the speech from local lip visual features by finding a mapping function of viseme-to-phoneme, while global visual context is embedded into the intermediate layers of the generator to clarify the ambiguity in the mapping induced by homophene. To achieve this, a visual context attention module is proposed where it encodes global representations from the local visual features, and provides the desired global visual context corresponding to the given coarse speech representation to the generator through audio-visual attention. In addition to the explicit modelling of local and global visual representations, synchronization learning is introduced as a form of contrastive learning that guides the generator to synthesize a speech in sync with the given input lip movements. Extensive experiments demonstrate that the proposed VCA-GAN outperforms existing state-of-the-art and is able to effectively synthesize the speech from multi-speaker that has been barely handled in the previous works.

name change, speech synthesis, visual context attentional gan, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model

Wang, Kaidi, He, Yi, Guan, Wenhao, Wu, Weijie, Ding, Hongwu, Zhang, Xiong, Wu, Di, Meng, Meng, Luan, Jian, Li, Lin, Hong, Qingyang

arXiv.org Artificial IntelligenceDec-8-2025

Video dubbing aims to generate high-fidelity speech that is precisely temporally aligned with the visual content. Existing methods still suffer from limitations in speech naturalness and audio-visual synchronization, and are limited to monolingual settings. To address these challenges, we propose SyncVoice, a vision-augmented video dubbing framework built upon a pretrained text-to-speech (TTS) model. By fine-tuning the TTS model on audio-visual data, we achieve strong audiovisual consistency. We propose a Dual Speaker Encoder to effectively mitigate inter-language interference in cross-lingual speech synthesis and explore the application of video dubbing in video translation scenarios. Experimental results show that SyncVoice achieves high-fidelity speech generation with strong synchronization performance, demonstrating its potential in video dubbing tasks.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.05126

Country: Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

Add feedback

a5a5b0ff87c59172a13342d428b1e033-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 12:17:49 GMT

dataset, detection, video, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Hubei Province > Wuhan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(5 more...)

Add feedback

Language Without Borders: A Dataset and Benchmark for Code-Switching Lip Reading

Neural Information Processing SystemsOct-9-2025, 23:15:25 GMT

Lip reading aims at transforming the videos of continuous lip movement into textual contents, and has achieved significant progress over the past decade. It serves as a critical yet practical assistance for speech-impaired individuals, with more practicability than speech recognition in noisy environments.

dataset, lip reading, speech recognition, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Europe > Russia > Central Federal District > Kursk Oblast > Kursk (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
(3 more...)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Lip to Speech Synthesis with Visual Context Attentional GAN

Neural Information Processing SystemsOct-2-2025, 13:24:12 GMT

Specifically, the proposed VCA-GAN synthesizes the speech from local lip visual features by finding a mapping function of viseme-to-phoneme, while global visual context is embedded into the intermediate layers of the generator to clarify the ambiguity in the mapping induced by homophene.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

A Lightweight Pipeline for Noisy Speech Voice Cloning and Accurate Lip Sync Synthesis

Amir, Javeria, Attaria, Farwa, Jabeen, Mah, Noor, Umara, Rashid, Zahid

arXiv.org Artificial IntelligenceSep-17-2025

Corresponding Author: Umara Noor Abstract Recent developments in voice cloning and talking-head generation demonstrate impressive capabilities in synthesizing natural speech and realistic lip synchronization. Current methods typically require and are trained on large-scale datasets and computationally intensive processes using clean, studio-recorded inputs, which is infeasible in noisy or low-resource environments. In this paper, we introduce a new modular pipeline comprising Tortoise text to speech, a transformer-based latent diffusion model that can perform high-fidelity zero-shot voice cloning given only a few training samples, and Wav2Lip, a lightweight generative adversarial network architecture for robust real-time lip synchronization. The solution will contribute to many essential tasks concerning less reliance on massive pretraining, generation of emotionally expressive speech, and lip-sync in noisy and unconstrained scenarios. In addition, the modular structure of the pipeline allows an easy extension for future multimodal and text-guided voice modulation, and it could be used in real-world systems. Our experimental results show that the proposed system produces competition-level sound quality and lip-sync with a much smaller computational cost, indicating the possibility of deploying it in resource-constrained scenarios. Keywords Zero-Shot Voice Cloning, Latent Diffusion Models, Real-Time Lip Synchronization, GAN-Based Talking-Head Generation, Low-Resource Speech Synthesis, Emotionally Expressive Speech 1. Introduction Voice clone and talking head generation systems have made tremendous progress in the past few years, benefiting from the development of deep and generative models. These devices can be employed for virtual assistants, entertainment, telepresence, and assistive communication, making human-computer interaction more realistic and personalized, based on interactive and audio-visual context. Despite advancements, the state-of-the-art solutions heavily rely on big data and sophisticated computational resources and therefore may not be practical for real-world low-resource or noisy settings.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.12831

Country: