AITopics | Li, Wenshuo

Collaborating Authors

Li, Wenshuo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Model

Feng, Qianhan, Li, Wenshuo, Lin, Tong, Chen, Xinghao

arXiv.org Artificial IntelligenceDec-2-2024

Vision-Language Models (VLMs) bring powerful understanding and reasoning capabilities to multimodal tasks. Meanwhile, the great need for capable aritificial intelligence on mobile devices also arises, such as the AI assistant software. Some efforts try to migrate VLMs to edge devices to expand their application scope. Simplifying the model structure is a common method, but as the model shrinks, the trade-off between performance and size becomes more and more difficult. Knowledge distillation (KD) can help models improve comprehensive capabilities without increasing size or data volume. However, most of the existing large model distillation techniques only consider applications on single-modal LLMs, or only use teachers to create new data environments for students. None of these methods take into account the distillation of the most important cross-modal alignment knowledge in VLMs. We propose a method called Align-KD to guide the student model to learn the cross-modal matching that occurs at the shallow layer. The teacher also helps student learn the projection of vision token into text embedding space based on the focus of text. Under the guidance of Align-KD, the 1.7B MobileVLM V2 model can learn rich knowledge from the 7B teacher model with light design of training loss, and achieve an average score improvement of 2.0 across 6 benchmarks under two training subsets respectively. Code is available at: https://github.com/fqhank/Align-KD.

distillation, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2412.01282

Country:

North America > United States > Hawaii (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.50)

Industry: Education (0.89)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

Li, Wenshuo, Chen, Xinghao, Shu, Han, Tang, Yehui, Wang, Yunhe

arXiv.org Artificial IntelligenceJun-17-2024

Large language models (LLM) have recently attracted significant attention in the field of artificial intelligence. However, the training process of these models poses significant challenges in terms of computational and storage capacities, thus compressing checkpoints has become an urgent problem. In this paper, we propose a novel Extreme Checkpoint Compression (ExCP) framework, which significantly reduces the required storage of training checkpoints while achieving nearly lossless performance. We first calculate the residuals of adjacent checkpoints to obtain the essential but sparse information for higher compression ratio. To further excavate the redundancy parameters in checkpoints, we then propose a weight-momentum joint shrinking method to utilize another important information during the model optimization, i.e., momentum. In particular, we exploit the information of both model and optimizer to discard as many parameters as possible while preserving critical information to ensure optimal performance. Furthermore, we utilize non-uniform quantization to further compress the storage of checkpoints. We extensively evaluate our proposed ExCP framework on several models ranging from 410M to 7B parameters and demonstrate significant storage reduction while maintaining strong performance. For instance, we achieve approximately $70\times$ compression for the Pythia-410M model, with the final performance being as accurate as the original model on various downstream tasks. Codes will be available at https://github.com/Gaffey/ExCP.

large language model, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2406.11257

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Consumer Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multi-shot NAS for Discovering Adversarially Robust Convolutional Neural Architectures at Targeted Capacities

Ning, Xuefei, Zhao, Junbo, Li, Wenshuo, Zhao, Tianchen, Yang, Huazhong, Wang, Yu

arXiv.org Artificial IntelligenceJan-1-2021

Convolutional neural networks (CNNs) are vulnerable to adversarial examples, and studies show that increasing the model capacity of an architecture topology (e.g., width expansion) can bring consistent robustness improvements. This reveals a clear robustness-efficiency trade-off that should be considered in architecture design. Recent studies have employed one-shot neural architecture search (NAS) to discover adversarially robust architectures. However, since the capacities of different topologies cannot be easily aligned during the search process, current one-shot NAS methods might favor topologies with larger capacity in the supernet. And the discovered topology might be sub-optimal when aligned to the targeted capacity. This paper proposes a novel multi-shot NAS method to explicitly search for adversarially robust architectures at a certain targeted capacity. Specifically, we estimate the reward at the targeted capacity using interior extra-polation of the rewards from multiple supernets. Experimental results demonstrate the effectiveness of the proposed method. For instance, at the targeted FLOPs of 1560M, the discovered MSRobNet-1560 (clean 84.8%, PGD100 52.9%) outperforms the recent NAS-discovered architecture RobNet-free (clean 82.8%, PGD100 52.6%) with similar FLOPs. Codes are available at https://github.com/walkerning/aw_nas.

architecture, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2012.11835

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback