Megrez-Omni Technical Report

Li, Boxun, Li, Yadong, Li, Zhiyuan, Liu, Congyi, Liu, Weilin, Niu, Guowei, Tan, Zheyue, Xu, Haiyang, Yao, Zhuyu, Yuan, Tao, Zhou, Dong, Zhuang, Yueqing, Yan, Shengen, Dai, Guohao, Wang, Yu

Feb-19-2025–arXiv.org Artificial Intelligence

In this work, we present the Megrez models, comprising a language model (Megrez-3B-Instruct) and a multimodal model (Megrez-3B-Omni). These models are designed to deliver fast inference, compactness, and robust edge-side intelligence through a software-hardware co-design approach. Megrez-3B-Instruct offers several advantages, including high accuracy, high speed, ease of use, and a wide range of applications. Building on Megrez-3B-Instruct, Megrez-3B-Omni is an on-device multimodal understanding LLM that supports image, text, and audio analysis. It achieves state-of-the-art accuracy across all three modalities and demonstrates strong versatility and robustness, setting a new benchmark for multimodal AI models.

arxiv preprint arxiv, language model, zhang, (14 more...)

arXiv.org Artificial Intelligence

Feb-19-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China > Shanghai > Shanghai (0.04)

Genre:
- Research Report (0.64)

Industry:
- Education (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found