inference performance
Birder: Communication-Efficient 1-bit Adaptive Optimizer for Practical Distributed DNN Training
Therefore, from a system-level perspective, the design ethos of a system-efficient communication-compression algorithm is that we should guarantee that the compression/decompression of the algorithm is computationally light and takes less time, and it should also be friendly to efficient collective communication primitives.
Birder: Communication-Efficient 1-bit Adaptive Optimizer for Practical Distributed DNN Training
Various gradient compression algorithms have been proposed to alleviate the communication bottleneck in distributed learning, and they have demonstrated effectiveness in terms of high compression ratios and theoretical low communication complexity. However, when it comes to practically training modern deep neural networks (DNNs), these algorithms have yet to match the inference performance of uncompressed SGD-momentum (SGDM) and adaptive optimizers (e.g.,Adam). More importantly, recent studies suggest that these algorithms actually offer no speed advantages over SGDM/Adam when used with common distributed DNN training frameworks ( e.g., DistributedDataParallel (DDP)) in the typical settings, due to heavy compression/decompression computation or incompatibility with the efficient All-Reduce or the requirement of uncompressed warmup at the early stage. For these reasons, we propose a novel 1-bit adaptive optimizer, dubbed *Bi*nary *r*andomization a*d*aptive optimiz*er* (**Birder**). The quantization of Birder can be easily and lightly computed, and it does not require warmup with its uncompressed version in the beginning. Also, we devise Hierarchical-1-bit-All-Reduce to further lower the communication volume. We theoretically prove that it promises the same convergence rate as the Adam. Extensive experiments, conducted on 8 to 64 GPUs (1 to 8 nodes) using DDP, demonstrate that Birder achieves comparable inference performance to uncompressed SGDM/Adam, with up to ${2.5 \times}$ speedup for training ResNet-50 and ${6.3\times}$ speedup for training BERT-Base. Code is publicly available at https://openi.pcl.ac.cn/c2net_optim/Birder.
Rethinking Deep Neural Network Ownership Verification: Embedding Passports to Defeat Ambiguity Attacks
With substantial amount of time, resources and human (team) efforts invested to explore and develop successful deep neural networks (DNN), there emerges an urgent need to protect these inventions from being illegally copied, redistributed, or abused without respecting the intellectual properties of legitimate owners. Following recent progresses along this line, we investigate a number of watermark-based DNN ownership verification methods in the face of ambiguity attacks, which aim to cast doubts on the ownership verification by forging counterfeit watermarks. It is shown that ambiguity attacks pose serious threats to existing DNN watermarking methods. As remedies to the above-mentioned loophole, this paper proposes novel passport-based DNN ownership verification schemes which are both robust to network modifications and resilient to ambiguity attacks. The gist of embedding digital passports is to design and train DNN models in a way such that, the DNN inference performance of an original task will be significantly deteriorated due to forged passports. In other words, genuine passports are not only verified by looking for the predefined signatures, but also reasserted by the unyielding DNN model inference performances. Extensive experimental results justify the effectiveness of the proposed passport-based DNN ownership verification schemes. Code and models are available at https://github.com/kamwoh/DeepIPR
Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning
Liu, Guozhi, Mu, Qi, Huang, Tiansheng, Wang, Xinhua, Shen, Li, Lin, Weiwei, Li, Zhang
Harmful fine-tuning issues present significant safety challenges for fine-tuning-as-a-service in large language models. Existing alignment-stage defenses, e.g., Vaccine, Repnoise, Booster, and T-Vaccine, mitigate harmful fine-tuning issues by enhancing the model's robustness during the alignment phase. While these methods have been proposed to mitigate the issue, they often overlook a critical upstream factor: the role of the original safety-alignment data. We observe that their defense performance and computational efficiency remain constrained by the quality and composition of the alignment dataset. To address this limitation, we propose Pharmacist, a safety alignment data curation solution that enhances defense against harmful fine-tuning by selecting a high-quality and safety-critical core subset from the original alignment data. The core idea of Pharmacist is to train an alignment data selector to rank alignment data. Specifically, up-ranking high-quality and safety-critical alignment data, down-ranking low-quality and non-safety-critical data. Empirical results indicate that models trained on datasets selected by Pharmacist outperform those trained on datasets selected by existing selection methods in both defense and inference performance. In addition, Pharmacist can be effectively integrated with mainstream alignment-stage defense methods. For example, when applied to RepNoise and T-Vaccine, using the dataset selected by Pharmacist instead of the full dataset leads to improvements in defense performance by 2.60\% and 3.30\%, respectively, and enhances inference performance by 3.50\% and 1.10\%. Notably, it reduces training time by 56.83\% and 57.63\%, respectively. Our code is available at https://github.com/Lslland/Pharmacist.
- Asia > China > Guangdong Province > Guangzhou (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States (0.04)
- Asia > China > Shandong Province (0.04)
- Health & Medicine > Therapeutic Area > Vaccines (0.74)
- Health & Medicine > Therapeutic Area > Immunology (0.64)
Birder: Communication-Efficient 1-bit Adaptive Optimizer for Practical Distributed DNN Training
Therefore, from a system-level perspective, the design ethos of a system-efficient communication-compression algorithm is that we should guarantee that the compression/decompression of the algorithm is computationally light and takes less time, and it should also be friendly to efficient collective communication primitives.
- North America > Canada (0.04)
- Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Common Q1: Theoretical justification on why A WP works
Common Q1: Theoretical justification on why A WP works. Based on previous work on P AC-Bayes bound (Neyshabur et al., NeurIPS 2017), in adversarial training, let R#1 Q1: The weights are constantly perturbed in the worst case, the model may find it difficult to learn. R#1 Q2: How do the baseline methods that do implicit weight perturbations differ from A WP? We did not claim that "baseline methods do the implicit weight perturbations". R#1 Q3: What is the difference of weights learned by A T -A WP and vanilla A T? R#2 Q1: Only CIF AR-10 and single neural networks are tested. We have tested several network architectures and datasets in the main body and appendix, e.g., PreAct ResNet-18, R#2 Q2: In Figure 1, the α value in the loss landscape is embed into training or post-training?
Xuxi Chen
Despite tremendous success in many application scenarios, the training and inference costs of using deep learning are also rapidly increasing over time. The lottery ticket hypothesis (L TH) emerges as a promising framework to leverage a special sparse subnetwork (i.e., winning ticket) instead of a full model for both training and inference, that can lower both costs without sacrificing the performance.
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > China (0.04)
- Contests & Prizes (0.61)
- Research Report > New Finding (0.46)
- Leisure & Entertainment (0.75)
- Information Technology > Security & Privacy (0.68)
MNN-LLM: A Generic Inference Engine for Fast Large Language Model Deployment on Mobile Devices
Wang, Zhaode, Yang, Jingbang, Qian, Xinyu, Xing, Shiwen, Jiang, Xiaotang, Lv, Chengfei, Zhang, Shengyu
Large language models (LLMs) have demonstrated exceptional performance across a variety of tasks. However, their substantial scale leads to significant computational resource consumption during inference, resulting in high costs. Consequently, edge device inference presents a promising solution. The primary challenges of edge inference include memory usage and inference speed. This paper introduces MNN-LLM, a framework specifically designed to accelerate the deployment of large language models on mobile devices. MNN-LLM addresses the runtime characteristics of LLMs through model quantization and DRAM-Flash hybrid storage, effectively reducing memory usage. It rearranges weights and inputs based on mobile CPU instruction sets and GPU characteristics while employing strategies such as multicore load balancing, mixed-precision floating-point operations, and geometric computations to enhance performance. Notably, MNN-LLM achieves up to a 8.6x speed increase compared to current mainstream LLM-specific frameworks.
- Asia > China > Zhejiang Province > Hangzhou (0.05)
- North America > United States > California > San Diego County > Carlsbad (0.04)
- Asia > China > Beijing > Beijing (0.04)
Task-Oriented Low-Label Semantic Communication With Self-Supervised Learning
Gu, Run, Xu, Wei, Yang, Zhaohui, Niyato, Dusit, Yener, Aylin
Task-oriented semantic communication enhances transmission efficiency by conveying semantic information rather than exact messages. Deep learning (DL)-based semantic communication can effectively cultivate the essential semantic knowledge for semantic extraction, transmission, and interpretation by leveraging massive labeled samples for downstream task training. In this paper, we propose a self-supervised learning-based semantic communication framework (SLSCom) to enhance task inference performance, particularly in scenarios with limited access to labeled samples. Specifically, we develop a task-relevant semantic encoder using unlabeled samples, which can be collected by devices in real-world edge networks. To facilitate task-relevant semantic extraction, we introduce self-supervision for learning contrastive features and formulate the information bottleneck (IB) problem to balance the tradeoff between the informativeness of the extracted features and task inference performance. Given the computational challenges of the IB problem, we devise a practical and effective solution by employing self-supervised classification and reconstruction pretext tasks. We further propose efficient joint training methods to enhance end-to-end inference accuracy over wireless channels, even with few labeled samples. We evaluate the proposed framework on image classification tasks over multipath wireless channels. Part of this work was presented in WOCC 2024 [1]. Run Gu and Wei Xu are with National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China, and also with Purple Mountain Laboratories, Nanjing 211111, China (e-mail: {rung, wxu }@seu.edu.cn). Zhaohui Y ang is with the Zhejiang Lab, Hangzhou 311121, China, and also with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, Zhejiang 310027, China (yang zhaohui@zju.edu.cn). Dusit Niyato is with the School of Computer Science and Engineering, Nanyang Technological University, Singapore 308232 (dniyato@ntu.edu.sg). A ylin Y ener is with the Department of Electrical and Computer Engineering, The Ohio State University, OH 43210, USA (yener@ece.osu.edu). 2 With the widespread deployment of edge devices and the rapid development of artificial intelligence (AI), an impressive landscape of connected intelligence is emerging [2]-[5].
- Asia > China > Zhejiang Province > Hangzhou (0.44)
- Asia > China > Jiangsu Province > Nanjing (0.44)
- North America > United States > Ohio (0.24)
- (14 more...)
- Information Technology (0.48)
- Education (0.34)