Multimodal Deep Learning

Akkus, Cem, Chu, Luyang, Djakovic, Vladana, Jauch-Walser, Steffen, Koch, Philipp, Loss, Giacomo, Marquardt, Christopher, Moldovan, Marco, Sauter, Nadja, Schneider, Maximilian, Schulte, Rickmer, Urbanczyk, Karol, Goschenhofer, Jann, Heumann, Christian, Hvingelby, Rasmus, Schalk, Daniel, Aßenmacher, Matthias

Jan-12-2023–arXiv.org Artificial Intelligence

FIGURE 1: LMU seal (left) style-transferred to Van Gogh's Sunflower painting (center) and blended with the prompt - Van Gogh, sunflowers - via CLIP+VGAN (right). In the last few years, there have been several breakthroughs in the methodologies used in Natural Language Processing (NLP) as well as Computer Vision (CV). Beyond these improvements on single-modality models, large-scale multimodal approaches have become a very active area of research. In this seminar, we reviewed these approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other Chapter 3.1 and Chapter 3.2), as well as models in which one modality is utilized to enhance representation learning for the other (Chapter 3.3 and Chapter 3.4). To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced (Chapter 3.5). Finally, we also cover other modalities (Chapter 4.1 and Chapter 4.2) as well as general-purpose multi-modal models (Chapter 4.3), which are able to handle different tasks on different modalities within one unified architecture.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

Jan-12-2023

arXiv.org PDF

Add feedback

Country:
- Africa (0.04)
- North America
  - United States
    - New York (0.04)
    - California
      - Santa Clara County > Palo Alto (0.04)
      - San Francisco County > San Francisco (0.04)
      - Los Angeles County > Long Beach (0.04)
  - Canada
    - Ontario > Toronto (0.13)
    - Newfoundland and Labrador > Labrador (0.04)
- Europe
  - Austria (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Summary/Review (1.00)
- Overview (1.00)
- Research Report
  - Promising Solution (1.00)
  - New Finding (1.00)

Industry:
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Law (1.00)
- Information Technology (1.00)
- Energy (1.00)
- Education (1.00)
- Transportation (0.67)
- Health & Medicine
  - Therapeutic Area > Oncology (1.00)
  - Diagnostic Medicine (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language
    - Text Processing (1.00)
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found