Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Wei, Yake, Hu, Di, Tian, Yapeng, Li, Xuelong
–arXiv.org Artificial Intelligence
Sight and hearing are two senses that play a vital role in human communication and scene understanding. To mimic human perception ability, audio-visual learning, aimed at developing computational approaches to learn from both audio and visual modalities, has been a flourishing field in recent years. A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected. Starting from the analysis of audio-visual cognition foundations, we introduce several key findings that have inspired our computational studies. Then, we systematically review the recent audio-visual learning studies and divide them into three categories: audio-visual boosting, cross-modal perception and audio-visual collaboration. Through our analysis, we discover that, the consistency of audio-visual data across semantic, spatial and temporal support the above studies. To revisit the current development of the audio-visual learning field from a more macro view, we further propose a new perspective on audio-visual scene understanding, then discuss and analyze the feasible future direction of the audio-visual learning area. Overall, this survey reviews and outlooks the current audio-visual learning field from different aspects. We hope it can provide researchers with a better understanding of this area. A website including constantly-updated survey is released: \url{https://gewu-lab.github.io/audio-visual-learning/}.
arXiv.org Artificial Intelligence
Aug-19-2022
- Country:
- Asia
- China
- Beijing > Beijing (0.04)
- Shaanxi Province > Xi'an (0.04)
- Japan > Honshū
- Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- China
- Europe > Poland
- Opole Province > Opole (0.04)
- North America > United States
- Indiana > Marion County
- Lawrence (0.04)
- Texas > Dallas County
- Dallas (0.04)
- Richardson (0.04)
- Indiana > Marion County
- Asia
- Genre:
- Overview (1.00)
- Research Report (1.00)
- Industry:
- Health & Medicine > Therapeutic Area
- Neurology (1.00)
- Leisure & Entertainment (1.00)
- Media > Music (1.00)
- Health & Medicine > Therapeutic Area
- Technology:
- Information Technology
- Artificial Intelligence
- Cognitive Science > Neuroscience (0.67)
- Machine Learning
- Natural Language > Text Processing (0.92)
- Representation & Reasoning (1.00)
- Speech > Speech Recognition (0.69)
- Vision > Image Understanding (0.68)
- Communications (1.00)
- Sensing and Signal Processing > Image Processing (1.00)
- Artificial Intelligence
- Information Technology