OmniFusion Technical Report

Goncharova, Elizaveta, Razzhigaev, Anton, Mikhalchuk, Matvey, Kurkin, Maxim, Abdullaeva, Irina, Skripkin, Matvey, Oseledets, Ivan, Dimitrov, Denis, Kuznetsov, Andrey

Apr-9-2024–arXiv.org Artificial Intelligence

In recent years, multimodal architectures emerged as a powerful paradigm for enhancing artificial intelligence (AI) systems, enabling them to process and understand multiple types of data simultaneously [1, 2, 3]. The integration of different data modalities, such as text and images, has significantly improved the capabilities of large language models (LLMs) in various tasks, ranging from visual question answering (VQA) [4] to complex decision-making processes [5, 6]. However, the challenge of effectively coupling various data types remains a significant obstacle in the development of truly integrative AI models. Furthermore, such multimodal multitask architectures are interpreted as the first steps towards the development of the artificial general intelligence (AGI), expanding the number of challenges in world cognition. This work introduces the OmniFusion model, a novel multimodal architecture that leverages the strengths of pretrained LLMs and introduces specialized adapters for processing visual information.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

Apr-9-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.14)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Health & Medicine > Consumer Health (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found