AITopics | lipreading

Collaborating Authors

lipreading

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning Speaker-Invariant Visual Features for Lipreading

Li, Yu, Xue, Feng, Li, Shujie, Zhang, Jinrui, Yang, Shuang, Guo, Dan, Hong, Richang

arXiv.org Artificial IntelligenceJun-10-2025

Lipreading is a challenging cross-modal task that aims to convert visual lip movements into spoken text. Existing lipreading methods often extract visual features that include speaker-specific lip attributes (e.g., shape, color, texture), which introduce spurious correlations between vision and text. These correlations lead to suboptimal lipreading accuracy and restrict model generalization. To address this challenge, we introduce SIFLip, a speaker-invariant visual feature learning framework that disentangles speaker-specific attributes using two complementary disentanglement modules (Implicit Disentanglement and Explicit Disentanglement) to improve generalization. Specifically, since different speakers exhibit semantic consistency between lip movements and phonetic text when pronouncing the same words, our implicit disentanglement module leverages stable text embeddings as supervisory signals to learn common visual representations across speakers, implicitly decoupling speaker-specific features. Additionally, we design a speaker recognition sub-task within the main lipreading pipeline to filter speaker-specific features, then further explicitly disentangle these personalized visual features from the backbone network via gradient reversal. Experimental results demonstrate that SIFLip significantly enhances generalization performance across multiple public datasets. Experimental results demonstrate that SIFLip significantly improves generalization performance across multiple public datasets, outperforming state-of-the-art methods.

artificial intelligence, image understanding, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2506.07572

Country: Asia (0.46)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lipreading by neural networks: Visual preprocessing, learning, and sensory integration

Neural Information Processing SystemsApr-6-2023, 18:57:18 GMT

We have developed visual preprocessing algorithms for extracting phonologically relevant features from the grayscale video image of a speaker, to provide speaker-independent inputs for an automat(cid:173) ic lipreading ("speechreading") system. Visual features such as mouth open/closed, tongue visible/not-visible, teeth visible/not(cid:173) visible, and several shape descriptors of the mouth and its motion are all rapidly computable in a manner quite insensitive to lighting conditions. We formed a hybrid speechreading system consisting of two time delay neural networks (video and acoustic) and inte(cid:173) grated their responses by means of independent opinion pooling - the Bayesian optimal method given conditional independence, which seems to hold for our data. This hybrid system had an er(cid:173) ror rate 25% lower than that of the acoustic subsystem alone on a five-utterance speaker-independent task, indicating that video can be used to improve speech recognition.

lipreading, neural network, sensory integration, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Surface Learning with Applications to Lipreading

Neural Information Processing SystemsApr-6-2023, 18:56:57 GMT

Most connectionist research has focused on learning mappings from one space to another (eg. This paper introduces the more general task of learning constraint surfaces. It describes a simple but powerful architecture for learning and manipulating nonlinear surfaces from data. We demonstrate the technique on low dimensional synthetic surfaces and compare it to nearest neighbor approaches. We then show its utility in learning the space of lip images in a system for improving speech recognition by lip reading.

application, lipreading, surface learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Beyond Lipreading: Visual Speech Recognition Looks You in the Eye

#artificialintelligenceMar-27-2020, 12:34:34 GMT

Like the lipreading spies of yesteryear peering through their binoculars, almost all visual speech recognition VSR research these days focuses on mouth and lip motion. But a new study suggests that VSR models could perform even better if they used additional available visual information. The VSR field typically looks at the mouth region since it is believed that lip shape and motion contain almost all the information correlated with speech. This has made the information in other facial regions considered as weak by default. But a new paper from the Key Laboratory of Intelligent Information Processing of the Chinese Academy of Sciences and the University of Chinese Academy of Sciences proposes that information from extraoral facial regions can consistently benefit SOTA VSR model performance.

dataset, facial region, visual speech recognition look, (5 more...)

#artificialintelligence

Genre:

Research Report > New Finding (0.52)
Research Report > Experimental Study (0.37)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.76)

Add feedback

Lipper: Synthesizing Thy Speech using Multi-View Lipreading

Kumar, Yaman, Jain, Rohit, Salik, Khwaja Mohd., Shah, Rajiv Ratn, yin, Yifang, Zimmermann, Roger

arXiv.org Machine LearningJun-28-2019

Lipreading has a lot of potential applications such as in the domain of surveillance and video conferencing. Despite this, most of the work in building lipreading systems has been limited to classifying silent videos into classes representing text phrases. However, there are multiple problems associated with making lipreading a text-based classification task like its dependence on a particular language and vocabulary mapping. Thus, in this paper we propose a multi-view lipreading to audio system, namely Lipper, which models it as a regression task. The model takes silent videos as input and produces speech as the output. With multi-view silent videos, we observe an improvement over single-view speech reconstruction results. We show this by presenting an exhaustive set of experiments for speaker-dependent, out-of-vocabulary and speaker-independent settings. Further, we compare the delay values of Lipper with other speechreading systems in order to show the real-time nature of audio produced. We also perform a user study for the audios produced in order to understand the level of comprehensibility of audios produced using Lipper.

artificial intelligence, lipper, machine learning, (15 more...)

arXiv.org Machine Learning

1907.01367

Country: Asia (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Lipreading by neural networks: Visual preprocessing, learning, and sensory integration

Wolff, Gregory J., Prasad, K. Venkatesh, Stork, David G., Hennecke, Marcus

Neural Information Processing SystemsDec-31-1994

Automated speech recognition is notoriously hard, and thus any predictive source of information and constraints that could be incorporated into a computer speech recognition system would be desirable. Humans, especially the hearing impaired, can utilize visual information - "speech reading" - for improved accuracy (Dodd & Campbell, 1987, Sanders & Goodrich, 1971). Speech reading can provide direct information about segments, phonemes, rate, speaker gender and identity, and subtle information for segmenting speech from background noise or multiple speakers (De Filippo & Sims, 1988, Green & Miller, 1985). Fundamental support for the use of visual information comes from the complementary nature of the visual and acoustic speech signals. Utterances that are difficult to distinguish acoustically are the easiest to distinguish.

information, probability, recognition, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Mateo County > Menlo Park (0.05)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)

Add feedback

Surface Learning with Applications to Lipreading

Bregler, Christoph, Omohundro, Stephen M.

Neural Information Processing SystemsDec-31-1994

Most connectionist research has focused on learning mappings from one space to another (eg.

dimension, query, surface learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > Illinois (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Lipreading by neural networks: Visual preprocessing, learning, and sensory integration

Wolff, Gregory J., Prasad, K. Venkatesh, Stork, David G., Hennecke, Marcus

Neural Information Processing SystemsDec-31-1994

information, probability, recognition, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Mateo County > Menlo Park (0.05)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)

Add feedback

Surface Learning with Applications to Lipreading

Bregler, Christoph, Omohundro, Stephen M.

Neural Information Processing SystemsDec-31-1994

Most connectionist research has focused on learning mappings from one space to another (eg.

dimension, query, surface learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > Illinois (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Lipreading by neural networks: Visual preprocessing, learning, and sensory integration

Wolff, Gregory J., Prasad, K. Venkatesh, Stork, David G., Hennecke, Marcus

Neural Information Processing SystemsDec-31-1994

Automated speech recognition is notoriously hard, and thus any predictive source of information and constraints that could be incorporated into a computer speech recognition system would be desirable. Humans, especially the hearing impaired, can utilize visual information - "speech reading" - for improved accuracy (Dodd & Campbell, 1987, Sanders & Goodrich, 1971). Speech reading can provide direct information about segments, phonemes, rate, speaker gender and identity, and subtle informationfor segmenting speech from background noise or multiple speakers (De Filippo & Sims, 1988, Green & Miller, 1985). Fundamental support for the use of visual information comes from the complementary natureof the visual and acoustic speech signals. Utterances that are difficult to distinguish acoustically are the easiest to distinguish.

artificial intelligence, machine learning, recognition, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.30)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)

Add feedback