AITopics | He, You

Collaborating Authors

He, You

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GaLore$+$: Boosting Low-Rank Adaptation for LLMs with Cross-Head Projection

Liao, Xutao, Li, Shaohui, Xu, Yuhui, Li, Zhi, Liu, Yu, He, You

arXiv.org Artificial IntelligenceDec-15-2024

A variety of parameter-efficient fine-tuning methods have emerged in recent years, enabling an increasing number of institutions and researchers to fine-tune LLMs to meet their specific requirements. Adapters (Rebuffi et al., 2017; Houlsby et al., 2019; Lin et al., 2020; Karimi Mahabadi et al., 2021b;a) enable parameter-efficient fine-tuning by inserting trainable layers into LLMs while keeping other layers frozen. However, this approach also introduces additional inference latency. BitFit (Zaken et al., 2021) selectively tunes only the biases within the network, significantly reducing the number of parameters involved in fine-tuning. Prompt tuning achieves parameter-efficient fine-tuning by optimizing a set of new input tokens or prompts for each task (Li & Liang, 2021; Lester et al., 2021; Hambardzumyan et al., 2021; Liu et al., 2023). Hu et al. (2022) introduced LoRA, proposing that weight updates are low-rank and can be expressed as the product of two low-rank matrices. Furthermore, the trainable parameters can be merged with the original weights, eliminating additional inference latency. Recent studies combined parameter-efficient fine-tuning methods with quantization to enhance memory efficiency during the fine-tuning of LLMs (Kwon et al., 2022; Dettmers et al., 2023; Chai et al., 2023; Xu et al., 2023). And DoRA (Liu et al., 2024), or Weight-Decomposed Low-Rank Adaptation, is a parameterefficient fine-tuning method designed to enhance learning capacity and stability by decomposing pre-trained weights into magnitude and direction components, leveraging LoRA for directional updates, and achieving superior performance across tasks without additional inference costs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.1982

Country:

Europe (0.28)
Asia (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

LLMs Can Evolve Continually on Modality for X-Modal Reasoning

Yu, Jiazuo, Xiong, Haomiao, Zhang, Lu, Diao, Haiwen, Zhuge, Yunzhi, Hong, Lanqing, Wang, Dong, Lu, Huchuan, He, You, Chen, Long

arXiv.org Artificial IntelligenceNov-12-2024

Multimodal Large Language Models (MLLMs) have gained significant attention due to their impressive capabilities in multimodal understanding. However, existing methods rely heavily on extensive modal-specific pretraining and joint-modal tuning, leading to significant computational burdens when expanding to new modalities. In this paper, we propose PathWeave, a flexible and scalable framework with modal-Path sWitching and ExpAnsion abilities that enables MLLMs to continually EVolve on modalities for $\mathbb{X}$-modal reasoning. We leverage the concept of Continual Learning and develop an incremental training strategy atop pre-trained MLLMs, enabling their expansion to new modalities using uni-modal data, without executing joint-modal pretraining. In detail, a novel Adapter-in-Adapter (AnA) framework is introduced, in which uni-modal and cross-modal adapters are seamlessly integrated to facilitate efficient modality alignment and collaboration. Additionally, an MoE-based gating module is applied between two types of adapters to further enhance the multimodal interaction. To investigate the proposed method, we establish a challenging benchmark called Continual Learning of Modality (MCL), which consists of high-quality QA data from five distinct modalities: image, video, audio, depth and point cloud. Extensive experiments demonstrate the effectiveness of the proposed AnA framework on learning plasticity and memory stability during continual learning. Furthermore, PathWeave performs comparably to state-of-the-art MLLMs while concurrently reducing parameter training burdens by 98.73%. Our code locates at https://github.com/JiazuoYu/PathWeave

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2410.20178

Country:

Asia > China (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Continuing Education (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback