AITopics | Li, Fangyi

Collaborating Authors

Li, Fangyi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning

Gao, Shangqian, Hua, Ting, Shirkavand, Reza, Lin, Chi-Heng, Tang, Zhen, Li, Zhengao, Yuan, Longge, Li, Fangyi, Zhang, Zeyu, Ganjdanesh, Alireza, Qian, Lou, Jie, Xu, Hsu, Yen-Chang

arXiv.org Artificial IntelligenceJan-25-2025

Large Language Models (LLMs) have demonstrated remarkable abilities in tackling a wide range of complex tasks. However, their huge computational and memory costs raise significant challenges in deploying these models on resource-constrained devices or efficiently serving them. Prior approaches have attempted to alleviate these problems by permanently removing less important model structures, yet these methods often result in substantial performance degradation due to the permanent deletion of model parameters. In this work, we tried to mitigate this issue by reducing the number of active parameters without permanently removing them. Specifically, we introduce a differentiable dynamic pruning method that pushes dense models to maintain a fixed number of active parameters by converting their MLP layers into a Mixture of Experts (MoE) architecture. Our method, even without fine-tuning, consistently outperforms previous structural pruning techniques across diverse model families, including Phi-2, LLaMA-2, LLaMA-3, and Qwen-2.5.

large language model, machine learning, pruning, (18 more...)

arXiv.org Artificial Intelligence

2501.15316

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Promising Solution (0.46)

Industry:

Energy (0.68)
Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

All-in-One Tuning and Structural Pruning for Domain-Specific LLMs

Lu, Lei, Wang, Zhepeng, Bao, Runxue, Wang, Mengbing, Li, Fangyi, Wu, Yawen, Jiang, Weiwen, Xu, Jie, Wang, Yanzhi, Gao, Shangqian

arXiv.org Artificial IntelligenceDec-20-2024

Existing pruning techniques for large language models (LLMs) targeting domain-specific applications typically follow a two-stage process: pruning the pretrained general-purpose LLMs and then fine-tuning the pruned LLMs on specific domains. However, the pruning decisions, derived from the pretrained weights, remain unchanged during fine-tuning, even if the weights have been updated. Therefore, such a combination of the pruning decisions and the finetuned weights may be suboptimal, leading to non-negligible performance degradation. To address these limitations, we propose ATP: All-in-One Tuning and Structural Pruning, a unified one-stage structural pruning and fine-tuning approach that dynamically identifies the current optimal substructure throughout the fine-tuning phase via a trainable pruning decision generator. Moreover, given the limited available data for domain-specific applications, Low-Rank Adaptation (LoRA) becomes a common technique to fine-tune the LLMs. In ATP, we introduce LoRA-aware forward and sparsity regularization to ensure that the substructures corresponding to the learned pruning decisions can be directly removed after the ATP process. ATP outperforms the state-of-the-art two-stage pruning methods on tasks in the legal and healthcare domains. More specifically, ATP recovers up to 88% and 91% performance of the dense model when pruning 40% parameters of LLaMA2-7B and LLaMA3-8B models, respectively.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.14426

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology (0.93)
Education > Health & Safety > School Nutrition (0.47)
Health & Medicine > Therapeutic Area > Vaccines (0.46)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback