Fan, Ming
Stealthy Backdoor Attack to Real-world Models in Android Apps
Wei, Jiali, Fan, Ming, Zhang, Xicheng, Jiao, Wenjing, Wang, Haijun, Liu, Ting
Powered by their superior performance, deep neural networks (DNNs) have found widespread applications across various domains. Many deep learning (DL) models are now embedded in mobile apps, making them more accessible to end users through on-device DL. However, deploying on-device DL to users' smartphones simultaneously introduces several security threats. One primary threat is backdoor attacks. Extensive research has explored backdoor attacks for several years and has proposed numerous attack approaches. However, few studies have investigated backdoor attacks on DL models deployed in the real world, or they have shown obvious deficiencies in effectiveness and stealthiness. In this work, we explore more effective and stealthy backdoor attacks on real-world DL models extracted from mobile apps. Our main justification is that imperceptible and sample-specific backdoor triggers generated by DNN-based steganography can enhance the efficacy of backdoor attacks on real-world models. We first confirm the effectiveness of steganography-based backdoor attacks on four state-of-the-art DNN models. Subsequently, we systematically evaluate and analyze the stealthiness of the attacks to ensure they are difficult to perceive. Finally, we implement the backdoor attacks on real-world models and compare our approach with three baseline methods. We collect 38,387 mobile apps, extract 89 DL models from them, and analyze these models to obtain the prerequisite model information for the attacks. After identifying the target models, our approach achieves an average of 12.50% higher attack success rate than DeepPayload while better maintaining the normal performance of the models. Extensive experimental results demonstrate that our method enables more effective, robust, and stealthy backdoor attacks on real-world models.
GenAI4UQ: A Software for Inverse Uncertainty Quantification Using Conditional Generative Models
Fan, Ming, Zhang, Zezhong, Lu, Dan, Zhang, Guannan
We introduce GenAI4UQ, a software package for inverse uncertainty quantification in model calibration, parameter estimation, and ensemble forecasting in scientific applications. GenAI4UQ leverages a generative artificial intelligence (AI) based conditional modeling framework to address the limitations of traditional inverse modeling techniques, such as Markov Chain Monte Carlo methods. By replacing computationally intensive iterative processes with a direct, learned mapping, GenAI4UQ enables efficient calibration of model input parameters and generation of output predictions directly from observations. The software's design allows for rapid ensemble forecasting with robust uncertainty quantification, while maintaining high computational and storage efficiency. GenAI4UQ simplifies the model training process through built-in auto-tuning of hyperparameters, making it accessible to users with varying levels of expertise. Its conditional generative framework ensures versatility, enabling applicability across a wide range of scientific domains. At its core, GenAI4UQ transforms the paradigm of inverse modeling by providing a fast, reliable, and user-friendly solution. It empowers researchers and practitioners to quickly estimate parameter distributions and generate model predictions for new observations, facilitating efficient decision-making and advancing the state of uncertainty quantification in computational modeling. (The code and data are available at https://github.com/patrickfan/GenAI4UQ).
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation
Wang, Yu, Zhao, Shiwan, Wang, Zhihu, Huang, Heyuan, Fan, Ming, Zhang, Yubo, Wang, Zhixing, Wang, Haijun, Liu, Ting
The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs). However, despite their widespread adoption and success, CoT methods often exhibit instability due to their inability to consistently ensure the quality of generated reasoning paths, leading to sub-optimal reasoning performance. To address this challenge, we propose the \textbf{Strategic Chain-of-Thought} (SCoT), a novel methodology designed to refine LLM performance by integrating strategic knowledge prior to generating intermediate reasoning steps. SCoT employs a two-stage approach within a single prompt: first eliciting an effective problem-solving strategy, which is then used to guide the generation of high-quality CoT paths and final answers. Our experiments across eight challenging reasoning datasets demonstrate significant improvements, including a 21.05\% increase on the GSM8K dataset and 24.13\% on the Tracking\_Objects dataset, respectively, using the Llama3-8b model. Additionally, we extend the SCoT framework to develop a few-shot method with automatically matched demonstrations, yielding even stronger results. These findings underscore the efficacy of SCoT, highlighting its potential to substantially enhance LLM performance in complex reasoning tasks.
ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability
Wang, Xiao, Tsaris, Aristeidis, Liu, Siyan, Choi, Jong-Youl, Fan, Ming, Zhang, Wei, Yin, Junqi, Ashfaq, Moetasim, Lu, Dan, Balaprakash, Prasanna
Earth system predictability is challenged by the complexity of environmental dynamics and the multitude of variables involved. Current AI foundation models, although advanced by leveraging large and heterogeneous data, are often constrained by their size and data integration, limiting their effectiveness in addressing the full range of Earth system prediction challenges. To overcome these limitations, we introduce the Oak Ridge Base Foundation Model for Earth System Predictability (ORBIT), an advanced vision-transformer model that scales up to 113 billion parameters using a novel hybrid tensor-data orthogonal parallelism technique. As the largest model of its kind, ORBIT surpasses the current climate AI foundation model size by a thousandfold. Performance scaling tests conducted on the Frontier supercomputer have demonstrated that ORBIT achieves 230 to 707 PFLOPS, with scaling efficiency maintained at 78% to 96% across 24,576 AMD GPUs. These breakthroughs establish new advances in AI-driven climate modeling and demonstrate promise to significantly improve the Earth system predictability.
Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier
Tsaris, Aristeidis, Zhang, Chengming, Wang, Xiao, Yin, Junqi, Liu, Siyan, Ashfaq, Moetasim, Fan, Ming, Choi, Jong Youl, Wahib, Mohamed, Lu, Dan, Balaprakash, Prasanna, Wang, Feiyi
Vision Transformers (ViTs) are pivotal for foundational models in scientific imagery, including Earth science applications, due to their capability to process large sequence lengths. While transformers for text has inspired scaling sequence lengths in ViTs, yet adapting these for ViTs introduces unique challenges. We develop distributed sequence parallelism for ViTs, enabling them to handle up to 1M tokens. Our approach, leveraging DeepSpeed-Ulysses and Long-Sequence-Segmentation with model sharding, is the first to apply sequence parallelism in ViT training, achieving a 94% batch scaling efficiency on 2,048 AMD-MI250X GPUs. Evaluating sequence parallelism in ViTs, particularly in models up to 10B parameters, highlighted substantial bottlenecks. We countered these with hybrid sequence, pipeline, tensor parallelism, and flash attention strategies, to scale beyond single GPU memory limits. Our method significantly enhances climate modeling accuracy by 20% in temperature predictions, marking the first training of a transformer model on a full-attention matrix over 188K sequence length.
Conditional Pseudo-Reversible Normalizing Flow for Surrogate Modeling in Quantifying Uncertainty Propagation
Yang, Minglei, Wang, Pengjun, Fan, Ming, Lu, Dan, Cao, Yanzhao, Zhang, Guannan
We introduce a conditional pseudo-reversible normalizing flow for constructing surrogate models of a physical model polluted by additive noise to efficiently quantify forward and inverse uncertainty propagation. Existing surrogate modeling approaches usually focus on approximating the deterministic component of physical model. However, this strategy necessitates knowledge of noise and resorts to auxiliary sampling methods for quantifying inverse uncertainty propagation. In this work, we develop the conditional pseudo-reversible normalizing flow model to directly learn and efficiently generate samples from the conditional probability density functions. The training process utilizes dataset consisting of input-output pairs without requiring prior knowledge about the noise and the function. Our model, once trained, can generate samples from any conditional probability density functions whose high probability regions are covered by the training set. Moreover, the pseudo-reversibility feature allows for the use of fully-connected neural network architectures, which simplifies the implementation and enables theoretical analysis. We provide a rigorous convergence analysis of the conditional pseudo-reversible normalizing flow model, showing its ability to converge to the target conditional probability density function using the Kullback-Leibler divergence. To demonstrate the effectiveness of our method, we apply it to several benchmark tests and a real-world geologic carbon storage problem.
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies
Song, Shuaiwen Leon, Kruft, Bonnie, Zhang, Minjia, Li, Conglong, Chen, Shiyang, Zhang, Chengming, Tanaka, Masahiro, Wu, Xiaoxia, Rasley, Jeff, Awan, Ammar Ahmad, Holmes, Connor, Cai, Martin, Ghanem, Adam, Zhou, Zhongzhu, He, Yuxiong, Luferenko, Pete, Kumar, Divya, Weyn, Jonathan, Zhang, Ruixiong, Klocek, Sylwester, Vragov, Volodymyr, AlQuraishi, Mohammed, Ahdritz, Gustaf, Floristean, Christina, Negri, Cristina, Kotamarthi, Rao, Vishwanath, Venkatram, Ramanathan, Arvind, Foreman, Sam, Hippe, Kyle, Arcomano, Troy, Maulik, Romit, Zvyagin, Maxim, Brace, Alexander, Zhang, Bin, Bohorquez, Cindy Orozco, Clyde, Austin, Kale, Bharat, Perez-Rivera, Danilo, Ma, Heng, Mann, Carla M., Irvin, Michael, Pauloski, J. Gregory, Ward, Logan, Hayot, Valerie, Emani, Murali, Xie, Zhen, Lin, Diangen, Shukla, Maulik, Foster, Ian, Davis, James J., Papka, Michael E., Brettin, Thomas, Balaprakash, Prasanna, Tourassi, Gina, Gounley, John, Hanson, Heidi, Potok, Thomas E, Pasini, Massimiliano Lupo, Evans, Kate, Lu, Dan, Lunga, Dalton, Yin, Junqi, Dash, Sajal, Wang, Feiyi, Shankar, Mallikarjun, Lyngaas, Isaac, Wang, Xiao, Cong, Guojing, Zhang, Pei, Fan, Ming, Liu, Siyan, Hoisie, Adolfy, Yoo, Shinjae, Ren, Yihui, Tang, William, Felker, Kyle, Svyatkovskiy, Alexey, Liu, Hang, Aji, Ashwin, Dalton, Angela, Schulte, Michael, Schulz, Karl, Deng, Yuntian, Nie, Weili, Romero, Josh, Dallago, Christian, Vahdat, Arash, Xiao, Chaowei, Gibbs, Thomas, Anandkumar, Anima, Stevens, Rick
In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research.
R\'{e}nyi Divergence Deep Mutual Learning
Huang, Weipeng, Tao, Junjie, Deng, Changbo, Fan, Ming, Wan, Wenqiang, Xiong, Qi, Piao, Guangyuan
This paper revisits Deep Mutual Learning (DML), a simple yet effective computing paradigm. We propose using R\'{e}nyi divergence instead of the KL divergence, which is more flexible and tunable, to improve vanilla DML. This modification is able to consistently improve performance over vanilla DML with limited additional complexity. The convergence properties of the proposed paradigm are analyzed theoretically, and Stochastic Gradient Descent with a constant learning rate is shown to converge with $\mathcal{O}(1)$-bias in the worst case scenario for nonconvex optimization tasks. That is, learning will reach nearby local optima but continue searching within a bounded scope, which may help mitigate overfitting. Finally, our extensive empirical results demonstrate the advantage of combining DML and R\'{e}nyi divergence, leading to further improvement in model generalization.
BDMMT: Backdoor Sample Detection for Language Models through Model Mutation Testing
Wei, Jiali, Fan, Ming, Jiao, Wenjing, Jin, Wuxia, Liu, Ting
Deep neural networks (DNNs) and natural language processing (NLP) systems have developed rapidly and have been widely used in various real-world fields. However, they have been shown to be vulnerable to backdoor attacks. Specifically, the adversary injects a backdoor into the model during the training phase, so that input samples with backdoor triggers are classified as the target class. Some attacks have achieved high attack success rates on the pre-trained language models (LMs), but there have yet to be effective defense methods. In this work, we propose a defense method based on deep model mutation testing. Our main justification is that backdoor samples are much more robust than clean samples if we impose random mutations on the LMs and that backdoors are generalizable. We first confirm the effectiveness of model mutation testing in detecting backdoor samples and select the most appropriate mutation operators. We then systematically defend against three extensively studied backdoor attack levels (i.e., char-level, word-level, and sentence-level) by detecting backdoor samples. We also make the first attempt to defend against the latest style-level backdoor attacks. We evaluate our approach on three benchmark datasets (i.e., IMDB, Yelp, and AG news) and three style transfer datasets (i.e., SST-2, Hate-speech, and AG news). The extensive experimental results demonstrate that our approach can detect backdoor samples more efficiently and accurately than the three state-of-the-art defense approaches.
A Unified Framework for Analyzing and Detecting Malicious Examples of DNN Models
Jin, Kaidi, Zhang, Tianwei, Shen, Chao, Chen, Yufei, Fan, Ming, Lin, Chenhao, Liu, Ting
Deep Neural Networks are well known to be vulnerable to adversarial attacks and backdoor attacks, where minor modifications on the input can mislead the models to give wrong results. Although defenses against adversarial attacks have been widely studied, research on mitigating backdoor attacks is still at an early stage. It is unknown whether there are any connections and common characteristics between the defenses against these two attacks. In this paper, we present a unified framework for detecting malicious examples and protecting the inference results of Deep Learning models. This framework is based on our observation that both adversarial examples and backdoor examples have anomalies during the inference process, highly distinguishable from benign samples. As a result, we repurpose and revise four existing adversarial defense methods for detecting backdoor examples. Extensive evaluations indicate these approaches provide reliable protection against backdoor attacks, with a higher accuracy than detecting adversarial examples. These solutions also reveal the relations of adversarial examples, backdoor examples and normal samples in model sensitivity, activation space and feature space. This can enhance our understanding about the inherent features of these two attacks, as well as the defense opportunities.