Zhou, Hanxu
White-box Multimodal Jailbreaks Against Large Vision-Language Models
Wang, Ruofan, Ma, Xingjun, Zhou, Hanxu, Ji, Chuanjun, Ye, Guangnan, Jiang, Yu-Gang
Recent advancements in Large Vision-Language Models (VLMs) have underscored their superiority in various multimodal tasks. However, the adversarial robustness of VLMs has not been fully explored. Existing methods mainly assess robustness through unimodal adversarial attacks that perturb images, while assuming inherent resilience against text-based attacks. Different from existing attacks, in this work we propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within VLMs. Specifically, we propose a dual optimization objective aimed at guiding the model to generate affirmative responses with high toxicity. Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input, thus imbuing the image with toxic semantics. Subsequently, an adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions. The discovered adversarial image prefix and text suffix are collectively denoted as a Universal Master Key (UMK). When integrated into various malicious queries, UMK can circumvent the alignment defenses of VLMs and lead to the generation of objectionable content, known as jailbreaks. The experimental results demonstrate that our universal attack strategy can effectively jailbreak MiniGPT-4 with a 96% success rate, highlighting the vulnerability of VLMs and the urgent need for new alignment strategies.
Understanding Time Series Anomaly State Detection through One-Class Classification
Zhou, Hanxu, Zhang, Yuan, Leng, Guangjie, Wang, Ruofan, Xu, Zhi-Qin John
For a long time, research on time series anomaly detection has mainly focused on finding outliers within a given time series. Admittedly, this is consistent with some practical problems, but in other practical application scenarios, people are concerned about: assuming a standard time series is given, how to judge whether another test time series deviates from the standard time series, which is more similar to the problem discussed in one-class classification (OCC). Therefore, in this article, we try to re-understand and define the time series anomaly detection problem through OCC, which we call 'time series anomaly state detection problem'. We first use stochastic processes and hypothesis testing to strictly define the 'time series anomaly state detection problem', and its corresponding anomalies. Then, we use the time series classification dataset to construct an artificial dataset corresponding to the problem. We compile 38 anomaly detection algorithms and correct some of the algorithms to adapt to handle this problem. Finally, through a large number of experiments, we fairly compare the actual performance of various time series anomaly detection algorithms, providing insights and directions for future research by researchers.
Understanding the Initial Condensation of Convolutional Neural Networks
Zhou, Zhangchen, Zhou, Hanxu, Li, Yuqing, Xu, Zhi-Qin John
Previous research has shown that fully-connected networks with small initialization and gradient-based training methods exhibit a phenomenon known as condensation during training. This phenomenon refers to the input weights of hidden neurons condensing into isolated orientations during training, revealing an implicit bias towards simple solutions in the parameter space. However, the impact of neural network structure on condensation has not been investigated yet. In this study, we focus on the investigation of convolutional neural networks (CNNs). Our experiments suggest that when subjected to small initialization and gradient-based training methods, kernel weights within the same CNN layer also cluster together during training, demonstrating a significant degree of condensation. Theoretically, we demonstrate that in a finite training period, kernels of a two-layer CNN with small initialization will converge to one or a few directions. This work represents a step towards a better understanding of the non-linear training behavior exhibited by neural networks with specialized structures.
Towards Understanding the Condensation of Two-layer Neural Networks at Initial Training
Xu, Zhi-Qin John, Zhou, Hanxu, Luo, Tao, Zhang, Yaoyu
Studying the implicit regularization effect of the nonlinear training dynamics of neural networks (NNs) is important for understanding why over-parameterized neural networks often generalize well on real dataset. Empirically, existing works have shown that weights of NNs condense on isolated orientations with a small initialization. The condensation dynamics implies that NNs can learn features from the training data with a network configuration effectively equivalent to a much smaller network during the training. In this work, we show that the multiple roots of activation function at origin is a key factor to understanding the condensation at the initial stage of training. Our experiments suggest that the maximal number of condensed orientations is twice of the multiplicity. Our theoretical analysis confirms experiments for two cases, one is for the activation function of multiplicity one and the other is for the one-dimensional input. This work makes a step towards understanding how small initialization implicitly leads NNs to condensation at initial stage of training, which lays a solid foundation for the future study of the nonlinear dynamics of NNs and its implicit regularization effect at a later stage of training.
Deep frequency principle towards understanding why deeper learning is faster
Xu, Zhi-Qin John, Zhou, Hanxu
Understanding the effect of depth in deep learning is a critical problem. In this work, we utilize the Fourier analysis to empirically provide a promising mechanism to understand why deeper learning is faster. To this end, we separate a deep neural network into two parts, one is a pre-condition component and the other is a learning component, in which the output of the pre-condition one is the input of the learning one. Based on experiments of deep networks and real dataset, we propose a deep frequency principle, that is, the effective target function for a deeper hidden layer has a bias towards a function with more low frequency during the training. Therefore, the learning component effectively learns a lower frequency function if the pre-condition component has more layers. Due to the well-studied frequency principle, i.e., deep neural networks learn lower frequency functions faster, the deep frequency principle provides a reasonable explanation to why deeper learning is faster. We believe these empirical studies would be valuable for future theoretical studies of the effect of depth in deep learning.