A Survey of Resource-efficient LLM and Multimodal Foundation Models

Xu, Mengwei, Yin, Wangsong, Cai, Dongqi, Yi, Rongjie, Xu, Daliang, Wang, Qipeng, Wu, Bingyang, Zhao, Yihao, Yang, Chen, Wang, Shihe, Zhang, Qiyang, Lu, Zhenyan, Zhang, Li, Wang, Shangguang, Li, Yuanchun, Liu, Yunxin, Jin, Xin, Liu, Xuanzhe

Jan-15-2024–arXiv.org Artificial Intelligence

In the rapidly evolving field of artificial intelligence (AI), a paradigm shift is underway. We are witnessing the transition from specialized, fragmented deep learning models to versatile, one-size-fits-all foundation models. These advanced AI systems are capable of operating in an open-world context, interacting with open vocabularies and image pixels for unseen AI tasks, i.e., zero-shot abilities. They are exemplified by (1) Large Language Models (LLMs) such as GPTs [39] that can ingest almost every NLP task in the form as a prompt; (2) Vision Transformers Models (ViTs) such as Masked Autoencoder [133] that can handle various downstream vision tasks; (3) Latent Diffusion Models (LDMs) such as Stable Diffusion [310] that generate high-quality images with arbitrary text-based prompts; (4) Multimodal models such as CLIP [296] and ImageBind [116] that map different modal data into the same latent space and are widely used as backbone for cross-modality tasks like image retrieval/search and visual-question answering. Such flexibility and generality marks a significant departure from the earlier era of AI, setting a new standard for how AI interfaces with the world. The success of these foundation models is deeply rooted in their scalability: unlike their predecessors, these models' accuracy and generalization ability can continuously expand with more data or parameters, without altering the underlying simple algorithms and architectures.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Jan-15-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.27)
- Europe (1.00)
- North America > United States (0.45)

Genre:
- Overview (1.00)
- Research Report > Promising Solution (0.92)

Industry:
- Education (1.00)
- Energy > Oil & Gas (0.67)
- Health & Medicine > Diagnostic Medicine
  - Imaging (0.45)
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)