AITopics | Yuan, Mu

Collaborating Authors

Yuan, Mu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A-VL: Adaptive Attention for Large Vision-Language Models

Zhang, Junyang, Yuan, Mu, Zhong, Ruiguang, Luo, Puhan, Zhan, Huiyou, Zhang, Ningkang, Hu, Chengchen, Li, Xiangyang

arXiv.org Artificial IntelligenceSep-23-2024

The Large Vision-Language Model (LVLM) integrates computer vision and natural language processing techniques, offering substantial application potential. However, these models demand extensive resources during inference. Adaptive attention techniques can dynamically reduce computational redundancy and thus improve efficiency. Although current adaptive attention methods significantly reduce the memory requirements of Transformer-based language models, they are not tailored for LVLMs. We observe that LVLMs generate responses from both remote image tokens and local text tokens, and different modalities have different attention patterns. This observation inspires us to manage the attention for each modality separately. Specifically, for visual input, we store the cache of potentially useful information but only compute the most critical parts. For language input, we care more about local information. Based on our observation and analysis of vision-language attention patterns, we develop A-VL, a plug-and-play adaptive attention tailored for LVLM inference. Extensive evaluations on three vision-language tasks and five datasets show the effectiveness of our designs. Our approach A-VL outperforms existing adaptive attention methods in reducing memory usage and computational load without compromising performance.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2409.14846

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Secure Transformer Inference

Yuan, Mu, Zhang, Lan, Li, Xiang-Yang

arXiv.org Artificial IntelligenceNov-14-2023

Applications of Transformer models are exploding, e.g., ChatGPT [1]. Security is critical to Transformer-based services, which determines whether applications can be scaled to privacy-sensitive areas like cloud copilot for proprietary code and documents [2]. Existing work [3, 4] studied this problem under the classic secure multi-party computing framework. Using encryption and decryption methods requires approximation of complex nonlinear layers and introduces heavy computational overhead. In this work, we propose a three-party protocol using permutation to protect both model parameters and user data without any approximation of Transformer models.

large language model, layernorm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2312.00025

Country: Asia > China (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

MLink: Linking Black-Box Models from Multiple Domains for Collaborative Inference

Yuan, Mu, Zhang, Lan, Zheng, Zimu, Zhang, Yi-Nan, Li, Xiang-Yang

arXiv.org Artificial IntelligenceJun-6-2023

The cost efficiency of model inference is critical to real-world machine learning (ML) applications, especially for delay-sensitive tasks and resource-limited devices. A typical dilemma is: in order to provide complex intelligent services (e.g. smart city), we need inference results of multiple ML models, but the cost budget (e.g. GPU memory) is not enough to run all of them. In this work, we study underlying relationships among black-box ML models and propose a novel learning task: model linking, which aims to bridge the knowledge of different black-box models by learning mappings (dubbed model links) between their output spaces. We propose the design of model links which supports linking heterogeneous black-box ML models. Also, in order to address the distribution discrepancy challenge, we present adaptation and aggregation methods of model links. Based on our proposed model links, we developed a scheduling algorithm, named MLink. Through collaborative multi-model inference enabled by model links, MLink can improve the accuracy of obtained inference results under the cost budget. We evaluated MLink on a multi-modal dataset with seven different ML models and two real-world video analytics systems with six ML models and 3,264 hours of video. Experimental results show that our proposed model links can be effectively built among various black-box models. Under the budget of GPU memory, MLink can save 66.7% inference computations while preserving 94% inference accuracy, which outperforms multi-task learning, deep reinforcement learning-based scheduler and frame filtering baselines.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2209.13883

Country:

Asia > China (0.29)
North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Air (1.00)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference

Yuan, Mu, Zhang, Lan, He, Fengxiang, Tong, Xueting, Song, Miao-Hui, Xu, Zhengyuan, Li, Xiang-Yang

arXiv.org Artificial IntelligenceJun-6-2023

Mobile-centric AI applications have high requirements for resource-efficiency of model inference. Input filtering is a promising approach to eliminate the redundancy so as to reduce the cost of inference. Previous efforts have tailored effective solutions for many applications, but left two essential questions unanswered: (1) theoretical filterability of an inference workload to guide the application of input filtering techniques, thereby avoiding the trial-and-error cost for resource-constrained mobile applications; (2) robust discriminability of feature embedding to allow input filtering to be widely effective for diverse inference tasks and input content. To answer them, we first formalize the input filtering problem and theoretically compare the hypothesis complexity of inference models and input filters to understand the optimization potential. Then we propose the first end-to-end learnable input filtering framework that covers most state-of-the-art methods and surpasses them in feature embedding with robust discriminability. We design and implement InFi that supports six input modalities and multiple mobile-centric deployments. Comprehensive evaluations confirm our theoretical results and show that InFi outperforms strong baselines in applicability, accuracy, and efficiency. InFi achieve 8.5x throughput and save 95% bandwidth, while keeping over 90% accuracy, for a video analytics application on mobile platforms.

data mining, machine learning, workload, (23 more...)

arXiv.org Artificial Intelligence

2209.13873

Country:

Asia > China (0.47)
North America > United States (0.46)

Genre: Research Report > Promising Solution (0.54)

Industry: Information Technology (0.93)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Mobile (1.00)
(5 more...)

Add feedback