Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective

Chen, Chen, Li, Xiaolou, Liu, Zehua, Li, Lantian, Wang, Dong

Sep-29-2024–arXiv.org Artificial Intelligence

In the field of spoken language processing, audio-visual speech processing is receiving increasing research attention. Key components of this research include tasks such as lip reading, audio-visual speech recognition, and visual-to-speech synthesis. Although significant success has been achieved, theoretical analysis is still insufficient for audio-visual tasks. This paper presents a quantitative analysis based on information theory, focusing on information intersection between different modalities. Our results show that this analysis is valuable for understanding the difficulties of audio-visual processing tasks as well as the benefits that could be obtained by modality integration.

information, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Sep-29-2024

arXiv.org PDF

Add feedback

Country:
- Asia
  - China (0.14)
  - Singapore (0.14)

Genre:
- Research Report > New Finding (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.47)
  - Natural Language (1.00)
  - Speech > Speech Recognition (0.68)