Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey

Nemani, Praneeth, Krishna, G. Sai, Kundrapu, Supriya

Jun-14-2023–arXiv.org Artificial Intelligence

Speaker-independent VSR is a complex task that involves identifying spoken words or phrases from video recordings of a speaker's facial movements. Over the years, there has been a considerable amount of research in the field of VSR involving different algorithms and datasets to evaluate system performance. These efforts have resulted in significant progress in developing effective VSR models, creating new opportunities for further research in this area. This survey provides a detailed examination of the progression of VSR over the past three decades, with a particular emphasis on the transition from speaker-dependent to speaker-independent systems. We also provide a comprehensive overview of the various datasets used in VSR research and the preprocessing techniques employed to achieve speaker independence. The survey covers the works published from 1990 to 2023, thoroughly analyzing each work and comparing them on various parameters. This survey provides an in-depth analysis of speaker-independent VSR systems evolution from 1990 to 2023. It outlines the development of VSR systems over time and highlights the need to develop end-to-end pipelines for speaker-independent VSR. The pictorial representation offers a clear and concise overview of the techniques used in speaker-independent VSR, thereby aiding in the comprehension and analysis of the various methodologies. The survey also highlights the strengths and limitations of each technique and provides insights into developing novel approaches for analyzing visual speech cues. Overall, This comprehensive review provides insights into the current state-of-the-art speaker-independent VSR and highlights potential areas for future research.

dataset, recognition, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

Jun-14-2023

arXiv.org PDF

Add feedback

Country:
- Asia > India (0.04)
- South America (0.04)
- North America
  - United States (0.04)
  - Central America (0.04)
- Europe
  - Austria > Vienna (0.14)
  - United Kingdom > England
    - Surrey > Guildford (0.04)
  - Switzerland > Vaud
    - Lausanne (0.04)
  - Netherlands > South Holland
    - Delft (0.04)
  - Greece
    - Ionian Islands > Corfu (0.04)
    - Central Macedonia > Thessaloniki (0.04)
  - Germany
    - Schleswig-Holstein > Kiel (0.04)
    - Hamburg (0.04)
    - Baden-Württemberg > Freiburg (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)

Genre:
- Overview (1.00)
- Research Report
  - Promising Solution (1.00)
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area (0.92)
- Media (0.67)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Data Science
    - Data Mining (0.94)
    - Data Quality (0.93)
  - Artificial Intelligence
    - Vision > Face Recognition (1.00)
    - Speech > Speech Recognition (1.00)
    - Natural Language (1.00)
    - Cognitive Science (1.00)
    - Representation & Reasoning > Uncertainty
      - Fuzzy Logic (0.93)
    - Machine Learning
      - Statistical Learning (1.00)
      - Neural Networks > Deep Learning (1.00)
      - Learning Graphical Models > Directed Networks
        Bayesian Learning (0.92)