A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning

Nov-23-2024–arXiv.org Artificial Intelligence

Audio-visual correlation learning aims to capture and understand natural phenomena between audio and visual data. The rapid growth of Deep Learning propelled the development of proposals that process audio-visual data and can be observed in the number of proposals in the past years. Thus encouraging the development of a comprehensive survey. Besides analyzing the models used in this context, we also discuss some tasks of definition and paradigm applied in AI multimedia. In addition, we investigate objective functions frequently used and discuss how audio-visual data is exploited in the optimization process, i.e., the different methodologies for representing knowledge in the audio-visual domain. In fact, we focus on how human-understandable mechanisms, i.e., structured knowledge that reflects comprehensible knowledge, can guide the learning process. Most importantly, we provide a summarization of the recent progress of Audio-Visual Correlation Learning (AVCL) and discuss the future research directions.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Nov-23-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Colorado (0.04)
    - Washington > King County
      - Seattle (0.04)
    - New York > New York County
      - New York City (0.04)
  - Canada > Ontario
    - National Capital Region > Ottawa (0.04)
- Europe
  - United Kingdom > Scotland
    - City of Glasgow > Glasgow (0.04)
  - Switzerland > Basel-City
    - Basel (0.04)
  - Slovenia > Central Slovenia
    - Municipality of Ljubljana > Ljubljana (0.04)
  - Portugal > Porto
    - Porto (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Alpes-Maritimes > Nice (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - China (0.04)
  - South Korea
    - Seoul > Seoul (0.04)
    - Incheon > Incheon (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture
      - Tokyo (0.14)
    - Chūgoku > Hiroshima Prefecture
      - Hiroshima (0.04)
    - Chūbu > Ishikawa Prefecture
      - Kanazawa (0.04)
- Africa > Eswatini
  - Manzini > Manzini (0.04)

Genre:
- Overview (1.00)

Industry:
- Leisure & Entertainment (1.00)
- Education (1.00)
- Media > Music (0.67)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (1.00)
  - Artificial Intelligence
    - Vision (1.00)
    - Speech (1.00)
    - Representation & Reasoning (1.00)
    - Cognitive Science (0.93)
    - Natural Language > Large Language Model (0.92)
    - Machine Learning
      - Neural Networks > Deep Learning (1.00)
      - Statistical Learning (0.92)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found