AITopics | Du, Yuan

Collaborating Authors

Du, Yuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FBQuant: FeedBack Quantization for Large Language Models

Liu, Yijiang, Fang, Hengyu, He, Liulu, Zhang, Rongyu, Bai, Yichuan, Du, Yuan, Du, Li

arXiv.org Artificial IntelligenceJan-25-2025

Deploying Large Language Models (LLMs) on edge devices is increasingly important, as it eliminates reliance on network connections, reduces expensive API calls, and enhances user privacy. However, on-device deployment is challenging due to the limited computational resources of edge devices. In particular, the key bottleneck stems from memory bandwidth constraints related to weight loading. Weight-only quantization effectively reduces memory access, yet often induces significant accuracy degradation. Recent efforts to incorporate sub-branches have shown promise for mitigating quantization errors, but these methods either lack robust optimization strategies or rely on suboptimal objectives. To address these gaps, we propose FeedBack Quantization (FBQuant), a novel approach inspired by negative feedback mechanisms in automatic control. FBQuant inherently ensures that the reconstructed weights remain bounded by the quantization process, thereby reducing the risk of overfitting. To further offset the additional latency introduced by sub-branches, we develop an efficient CUDA kernel that decreases 60\% of extra inference time. Comprehensive experiments demonstrate the efficiency and effectiveness of FBQuant across various LLMs. Notably, for 3-bit Llama2-7B, FBQuant improves zero-shot accuracy by 1.2\%.

large language model, natural language, quantization, (15 more...)

arXiv.org Artificial Intelligence

2501.16385

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic

He, Liulu, Zhao, Yufei, Gao, Rui, Du, Yuan, Du, Li

arXiv.org Artificial IntelligenceJul-3-2024

Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast convolution by extending the Discrete Fourier Transform (DFT) with symbolic computing, in which only additions are required to perform the transformation at specific transform points, avoiding the calculation of irrational number and reducing the requirement for precision. Additionally, we enhance convolution efficiency by introducing correction terms to convert invalid circular convolution outputs of the Fourier method into effective ones. The numerical error analysis is presented for the first time in this type of work and proves that our algorithms can provide a 3.68x multiplication reduction for 3x3 convolution, while the Winograd algorithm only achieves a 2.25x reduction with similarly low numerical errors. Experiments carried out on benchmarks and FPGA show that our new algorithms can further improve the computation efficiency of quantized models while maintaining accuracy, surpassing both the quantization-alone method and existing works on fast convolution quantization.

artificial intelligence, data quality, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2407.02913

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

Zhang, Rongyu, Cheng, Aosong, Luo, Yulin, Dai, Gaole, Yang, Huanrui, Liu, Jiaming, Xu, Ran, Du, Li, Du, Yuan, Jiang, Yanbing, Zhang, Shanghang

arXiv.org Artificial IntelligenceMay-26-2024

Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models. As current vision models appear to be heavily biased towards texture, continuously adapting the model from one domain distribution to another can result in serious catastrophic forgetting. Drawing inspiration from the human visual system's adeptness at processing both shape and texture according to the famous Trichromatic Theory, we explore the integration of a Mixture-of-Activation-Sparsity-Experts (MoASE) as an adapter for the CTTA task. Given the distinct reaction of neurons with low/high activation to domain-specific/agnostic features, MoASE decomposes the neural activation into high-activation and low-activation components with a non-differentiable Spatial Differentiate Dropout (SDD). Based on the decomposition, we devise a multi-gate structure comprising a Domain-Aware Gate (DAG) that utilizes domain information to adaptive combine experts that process the post-SDD sparse activations of different strengths, and the Activation Sparsity Gate (ASG) that adaptively assigned feature selection threshold of the SDD for different experts for more precise feature decomposition. Finally, we introduce a Homeostatic-Proximal (HP) loss to bypass the error accumulation problem when continuously adapting the model. Extensive experiments on four prominent benchmarks substantiate that our methodology achieves state-of-the-art performance in both classification and segmentation CTTA tasks. Our code is now available at https://github.com/RoyZry98/MoASE-Pytorch.

arxiv preprint arxiv, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2405.16486

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning

Liu, Yijiang, Zhang, Rongyu, Yang, Huanrui, Keutzer, Kurt, Du, Yuan, Du, Li, Zhang, Shanghang

arXiv.org Artificial IntelligenceApr-13-2024

Large Language Models (LLMs) have demonstrated significant potential in performing multiple tasks in multimedia applications, ranging from content generation to interactive entertainment, and artistic creation. However, the diversity of downstream tasks in multitask scenarios presents substantial adaptation challenges for LLMs. While traditional methods often succumb to knowledge confusion on their monolithic dense models, Mixture-of-Experts (MoE) has been emerged as a promising solution with its sparse architecture for effective task decoupling. Inspired by the principles of human cognitive neuroscience, we design a novel framework \texttt{Intuition-MoR1E} that leverages the inherent semantic clustering of instances to mimic the human brain to deal with multitask, offering implicit guidance to router for optimized feature allocation. Moreover, we introduce cutting-edge Rank-1 Experts formulation designed to manage a spectrum of intuitions, demonstrating enhanced parameter efficiency and effectiveness in multitask LLM finetuning. Extensive experiments demonstrate that Intuition-MoR1E achieves superior efficiency and 2.15\% overall accuracy improvement across 14 public datasets against other state-of-the-art baselines.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.08985

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.54)
Materials > Chemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation

Zhang, Rongyu, Luo, Yulin, Liu, Jiaming, Yang, Huanrui, Dong, Zhen, Gudovskiy, Denis, Okuno, Tomoyuki, Nakata, Yohei, Keutzer, Kurt, Du, Yuan, Zhang, Shanghang

arXiv.org Artificial IntelligenceDec-27-2023

The Mixture-of-Experts (MoE) approach has demonstrated outstanding scalability in multi-task learning including low-level upstream tasks such as concurrent removal of multiple adverse weather effects. However, the conventional MoE architecture with parallel Feed Forward Network (FFN) experts leads to significant parameter and computational overheads that hinder its efficient deployment. In addition, the naive MoE linear router is suboptimal in assigning task-specific features to multiple experts which limits its further scalability. In this work, we propose an efficient MoE architecture with weight sharing across the experts. Inspired by the idea of linear feature modulation (FM), our architecture implicitly instantiates multiple experts via learnable activation modulations on a single shared expert block. The proposed Feature Modulated Expert (FME) serves as a building block for the novel Mixture-of-Feature-Modulation-Experts (MoFME) architecture, which can scale up the number of experts with low overhead. We further propose an Uncertainty-aware Router (UaR) to assign task-specific features to different FM modules with well-calibrated weights. This enables MoFME to effectively learn diverse expert functions for multiple tasks. The conducted experiments on the multi-deweather task show that our MoFME outperforms the baselines in the image restoration quality by 0.1-0.2 dB and achieves SOTA-compatible performance while saving more than 72% of parameters and 39% inference time over the conventional MoE counterpart. Experiments on the downstream segmentation and classification tasks further demonstrate the generalizability of MoFME to real open-world applications.

artificial intelligence, machine learning, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2312.1661

Country:

Asia (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning with Hierarchical Aggregation

Zhang, Rongyu, Chi, Xiaowei, Liu, Guiliang, Zhang, Wenyi, Du, Yuan, Wang, Fangxin

arXiv.org Artificial IntelligenceMar-27-2023

Multimodal learning has seen great success mining data features from multiple modalities with remarkable model performance improvement. Meanwhile, federated learning (FL) addresses the data sharing problem, enabling privacy-preserved collaborative training to provide sufficient precious data. Great potential, therefore, arises with the confluence of them, known as multimodal federated learning. However, limitation lies in the predominant approaches as they often assume that each local dataset records samples from all modalities. In this paper, we aim to bridge this gap by proposing an Unimodal Training - Multimodal Prediction (UTMP) framework under the context of multimodal federated learning. We design HA-Fedformer, a novel transformer-based model that empowers unimodal training with only a unimodal dataset at the client and multimodal testing by aggregating multiple clients' knowledge for better accuracy. The key advantages are twofold. Firstly, to alleviate the impact of data non-IID, we develop an uncertainty-aware aggregation method for the local encoders with layer-wise Markov Chain Monte Carlo sampling. Secondly, to overcome the challenge of unaligned language sequence, we implement a cross-modal decoder aggregation to capture the hidden signal correlation between decoders trained by data from different modalities. Our experiments on popular sentiment analysis benchmarks, CMU-MOSI and CMU-MOSEI, demonstrate that HA-Fedformer significantly outperforms state-of-the-art multimodal models under the UTMP federated learning frameworks, with 15%-20% improvement on most attributes.

artificial intelligence, machine learning, modality, (16 more...)

arXiv.org Artificial Intelligence

2303.15486

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback