AITopics | Dong, Peijie

Plotting

Dong, Peijie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

Zhang, Longteng, Liu, Xiang, Li, Zeyu, Pan, Xinglin, Dong, Peijie, Fan, Ruibo, Guo, Rui, Wang, Xin, Luo, Qiong, Shi, Shaohuai, Chu, Xiaowen

arXiv.org Artificial IntelligenceDec-1-2023

Large Language Models (LLMs) have seen great advance in both academia and industry, and their popularity results in numerous open-source frameworks and techniques in accelerating LLM pre-training, fine-tuning, and inference. Training and deploying LLMs are expensive as it requires considerable computing resources and memory, hence many efficient approaches have been developed for improving system pipelines as well as operators. However, the runtime performance can vary significantly across hardware and software stacks, which makes it difficult to choose the best configuration. In this work, we aim to benchmark the performance from both macro and micro perspectives. First, we benchmark the end-to-end performance of pre-training, fine-tuning, and serving LLMs in different sizes , i.e., 7, 13, and 70 billion parameters (7B, 13B, and 70B) on three 8-GPU platforms with and without individual optimization techniques, including ZeRO, quantization, recomputation, FlashAttention. Then, we dive deeper to provide a detailed runtime analysis of the sub-modules, including computing and communication operators in LLMs. For end users, our benchmark and findings help better understand different optimization techniques, training and inference frameworks, together with hardware platforms in choosing configurations for deploying LLMs. For researchers, our in-depth module-wise analyses discover potential opportunities for future work to further optimize the runtime performance of LLMs.

large language model, machine learning, platform, (20 more...)

arXiv.org Artificial Intelligence

2311.03687

Country: Asia > China > Guangdong Province (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization

Dong, Peijie, Li, Lujun, Wei, Zimian, Niu, Xin, Tian, Zhiliang, Pan, Hengyue

arXiv.org Artificial IntelligenceJul-19-2023

Mixed-Precision Quantization~(MQ) can achieve a competitive accuracy-complexity trade-off for models. Conventional training-based search methods require time-consuming candidate training to search optimized per-layer bit-width configurations in MQ. Recently, some training-free approaches have presented various MQ proxies and significantly improve search efficiency. However, the correlation between these proxies and quantization accuracy is poorly understood. To address the gap, we first build the MQ-Bench-101, which involves different bit configurations and quantization results. Then, we observe that the existing training-free proxies perform weak correlations on the MQ-Bench-101. To efficiently seek superior proxies, we develop an automatic search of proxies framework for MQ via evolving algorithms. In particular, we devise an elaborate search space involving the existing proxies and perform an evolution search to discover the best correlated MQ proxy. We proposed a diversity-prompting selection strategy and compatibility screening protocol to avoid premature convergence and improve search efficiency. In this way, our Evolving proxies for Mixed-precision Quantization~(EMQ) framework allows the auto-generation of proxies without heavy tuning and expert knowledge. Extensive experiments on ImageNet with various ResNet and MobileNet families demonstrate that our EMQ obtains superior performance than state-of-the-art mixed-precision methods at a significantly reduced cost. The code will be released.

artificial intelligence, machine learning, proxy, (19 more...)

arXiv.org Artificial Intelligence

2307.10554

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DisWOT: Student Architecture Search for Distillation WithOut Training

Dong, Peijie, Li, Lujun, Wei, Zimian

arXiv.org Artificial IntelligenceMar-27-2023

Knowledge distillation (KD) is an effective training strategy to improve the lightweight student models under the guidance of cumbersome teachers. However, the large architecture difference across the teacher-student pairs limits the distillation gains. In contrast to previous adaptive distillation methods to reduce the teacher-student gap, we explore a novel training-free framework to search for the best student architectures for a given teacher. Our work first empirically show that the optimal model under vanilla training cannot be the winner in distillation. Secondly, we find that the similarity of feature semantics and sample relations between random-initialized teacher-student networks have good correlations with final distillation performances. Thus, we efficiently measure similarity matrixs conditioned on the semantic activation maps to select the optimal student via an evolutionary algorithm without any training. In this way, our student architecture search for Distillation WithOut Training (DisWOT) significantly improves the performance of the model in the distillation stage with at least 180$\times$ training acceleration. Additionally, we extend similarity metrics in DisWOT as new distillers and KD-based zero-proxies. Our experiments on CIFAR, ImageNet and NAS-Bench-201 demonstrate that our technique achieves state-of-the-art results on different search spaces. Our project and code are available at https://lilujunai.github.io/DisWOT-CVPR2023/.

diswot, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2303.15678

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Multi-trial Neural Architecture Search with Lottery Tickets

Wei, Zimian, Pan, Hengyue, Li, Lujun, Lu, Menglong, Niu, Xin, Dong, Peijie, Li, Dongsheng

arXiv.org Artificial IntelligenceDec-2-2022

Neural architecture search (NAS) has brought significant progress in recent image recognition tasks. Most existing NAS methods apply restricted search spaces, which limits the upper-bound performance of searched models. To address this issue, we propose a new search space named MobileNet3-MT. By reducing human-prior knowledge in omni dimensions of networks, MobileNet3-MT accommodates more potential candidates. For searching in this challenging search space, we present an efficient Multi-trial Evolution-based NAS method termed MENAS. Specifically, we accelerate the evolutionary search process by gradually pruning models in the population. Each model is trained with an early stop and replaced by its Lottery Tickets (the explored optimal pruned network).In this way, the full training pipeline of cumbersome networks is prevented and more efficient networks are automatically generated. Extensive experimental results on ImageNet-1K, CIFAR-10, and CIFAR-100 demonstrate that MENAS achieves state-of-the-art performance.

artificial intelligence, lottery ticket, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2203.043

Country: Asia > China (0.14)

Genre: Contests & Prizes (1.00)

Industry: Leisure & Entertainment > Gambling (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

DMFormer: Closing the Gap Between CNN and Vision Transformers

Wei, Zimian, Pan, Hengyue, Li, Lujun, Lu, Menglong, Niu, Xin, Dong, Peijie, Li, Dongsheng

arXiv.org Artificial IntelligenceNov-28-2022

Vision transformers have shown excellent performance in computer vision tasks. As the computation cost of their self-attention mechanism is expensive, recent works tried to replace the self-attention mechanism in vision transformers with convolutional operations, which is more efficient with built-in inductive bias. However, these efforts either ignore multi-level features or lack dynamic prosperity, leading to sub-optimal performance. In this paper, we propose a Dynamic Multi-level Attention mechanism (DMA), which captures different patterns of input images by multiple kernel sizes and enables input-adaptive weights with a gating mechanism. Based on DMA, we present an efficient backbone network named DMFormer. DMFormer adopts the overall architecture of vision transformers, while replacing the self-attention mechanism with our proposed DMA. Extensive experimental results on ImageNet-1K and ADE20K datasets demonstrated that DMFormer achieves state-of-the-art performance, which outperforms similar-sized vision transformers(ViTs) and convolutional neural networks (CNNs).

artificial intelligence, dmformer, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2209.07738

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback