Lightweight Operations for Visual Speech Recognition

Panagos, Iason Ioannis, Sfikas, Giorgos, Nikou, Christophoros

Feb-7-2025–arXiv.org Artificial Intelligence

Visual speech recognition (VSR), which decodes spoken words from video data, offers significant benefits, particularly when audio is unavailable. However, the high dimensionality of video data leads to prohibitive computational costs that demand powerful hardware, limiting VSR deployment on resource-constrained devices. This work addresses this limitation by developing lightweight VSR architectures. Leveraging efficient operation design paradigms, we create compact yet powerful models with reduced resource requirements and minimal accuracy loss. We train and evaluate our models on a large-scale public dataset for recognition of words from video sequences, demonstrating their effectiveness for practical applications. We also conduct an extensive array of ablative experiments to thoroughly analyze the size and complexity of each model. Code and trained models will be made publicly available.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Feb-7-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Hawaii > Honolulu County > Honolulu (0.04)
- Europe
  - Switzerland (0.04)
  - Greece
    - Epirus > Ioannina (0.04)
    - Attica > Athens (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
- Asia > Taiwan
  - Taiwan Province > Taipei (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Speech > Speech Recognition (0.86)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found