perturbation direction
Boosting Adversarial Transferability via Residual Perturbation Attack
Peng, Jinjia, Tao, Zeze, Wang, Huibing, Wang, Meng, Wang, Yang
Deep neural networks are susceptible to adversarial examples while suffering from incorrect predictions via imperceptible perturbations. Transfer-based attacks create adversarial examples for surrogate models and transfer these examples to target models under black-box scenarios. Recent studies reveal that adversarial examples in flat loss landscapes exhibit superior transferability to alleviate over-fitting on surrogate models. However, the prior arts overlook the influence of perturbation directions, resulting in limited transferability. In this paper, we propose a novel attack method, named Residual Perturbation Attack (ResP A), relying on the residual gradient as the perturbation direction to guide the adversarial examples toward the flat regions of the loss function. Specifically, ResP A conducts an exponential moving average on the input gradients to obtain the first moment as the reference gradient, which encompasses the direction of historical gradients. Instead of heavily relying on the local flatness that stems from the current gradients as the perturbation direction, ResP A further considers the residual between the current gradient and the reference gradient to capture the changes in the global perturbation direction. The experimental results demonstrate the better transferability of ResP A than the existing typical transfer-based attack methods, while the transferability can be further improved by combining ResP A with the current input transformation methods. The code is available at https://github.com/ZezeTao/ResPA .
- Asia > China > Liaoning Province > Dalian (0.04)
- Asia > Middle East > Israel (0.04)
- Asia > China > Hebei Province (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
Lee, Daniel J., Heimersheim, Stefan
Sensitive directions experiments attempt to understand the computational features of Language Models (LMs) by measuring how much the next token prediction probabilities change by perturbing activations along specific directions. We extend the sensitive directions work by introducing an improved baseline for perturbation directions. We demonstrate that KL divergence for Sparse Autoencoder (SAE) reconstruction errors are no longer pathologically high compared to the improved baseline. We also show that feature directions uncovered by SAEs have varying impacts on model outputs depending on the SAE's sparsity, with lower L0 SAE feature directions exerting a greater influence. Additionally, we find that end-to-end SAE features do not exhibit stronger effects on model outputs compared to traditional SAEs.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
Reviews: On Blackbox Backpropagation and Jacobian Sensing
The paper focuses on automatic differentiation of multiple-variable vector-valued functioned in the context of training model parameters from input data. A standard approach in a number of popular environments rely on a propagation of the errors in the evaluation of the model parameters so far using backpropagation of the estimation error as computed of the (local) gradient of the loss function. In this paper, the emphasis is on settings where some of the operators of the model are exogenous blackboxes for which the gradient cannot be computed explicitly and one resorts to finite differencing of the function of interest. Such approach can be prohibitively expensive if the Jacobian does not have some special structure that can be exploited. The strategy pursued in this paper consists in exploiting the relationship between graph colouring and Jacobian estimation.
On Blackbox Backpropagation and Jacobian Sensing
Krzysztof M. Choromanski, Vikas Sindhwani
From a small number of calls to a given "blackbox" on random input perturbations, we show how to efficiently recover its unknown Jacobian, or estimate the left action of its Jacobian on a given vector. Our methods are based on a novel combination of compressed sensing and graph coloring techniques, and provably exploit structural prior knowledge about the Jacobian such as sparsity and symmetry while being noise robust. We demonstrate efficient backpropagation through noisy blackbox layers in a deep neural net, improved data-efficiency in the task of linearizing the dynamics of a rigid body system, and the generic ability to handle a rich class of input-output dependency structures in Jacobian estimation problems.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Backpropagation (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
ADBA:Approximation Decision Boundary Approach for Black-Box Adversarial Attacks
Wang, Feiyang, Zuo, Xingquan, Huang, Hai, Chen, Gang
Many machine learning models are susceptible to adversarial attacks, with decision-based black-box attacks representing the most critical threat in real-world applications. These attacks are extremely stealthy, generating adversarial examples using hard labels obtained from the target machine learning model. This is typically realized by optimizing perturbation directions, guided by decision boundaries identified through query-intensive exact search, significantly limiting the attack success rate. This paper introduces a novel approach using the Approximation Decision Boundary (ADB) to efficiently and accurately compare perturbation directions without precisely determining decision boundaries. The effectiveness of our ADB approach (ADBA) hinges on promptly identifying suitable ADB, ensuring reliable differentiation of all perturbation directions. For this purpose, we analyze the probability distribution of decision boundaries, confirming that using the distribution's median value as ADB can effectively distinguish different perturbation directions, giving rise to the development of the ADBA-md algorithm. ADBA-md only requires four queries on average to differentiate any pair of perturbation directions, which is highly query-efficient. Extensive experiments on six well-known image classifiers clearly demonstrate the superiority of ADBA and ADBA-md over multiple state-of-the-art black-box attacks. The source code is available at https://github.com/BUPTAIOC/ADBA.
- Asia > China > Beijing > Beijing (0.04)
- Oceania > New Zealand > North Island > Wellington Region > Wellington (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- (3 more...)
Visualizing, Rethinking, and Mining the Loss Landscape of Deep Neural Networks
Li, Xin-Chun, Li, Lan, Zhan, De-Chuan
The loss landscape of deep neural networks (DNNs) is commonly considered complex and wildly fluctuated. However, an interesting observation is that the loss surfaces plotted along Gaussian noise directions are almost v-basin ones with the perturbed model lying on the basin. This motivates us to rethink whether the 1D or 2D subspace could cover more complex local geometry structures, and how to mine the corresponding perturbation directions. This paper systematically and gradually categorizes the 1D curves from simple to complex, including v-basin, v-side, w-basin, w-peak, and vvv-basin curves. Notably, the latter two types are already hard to obtain via the intuitive construction of specific perturbation directions, and we need to propose proper mining algorithms to plot the corresponding 1D curves. Combining these 1D directions, various types of 2D surfaces are visualized such as the saddle surfaces and the bottom of a bottle of wine that are only shown by demo functions in previous works. Finally, we propose theoretical insights from the lens of the Hessian matrix to explain the observed several interesting phenomena.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
MISA: Unveiling the Vulnerabilities in Split Federated Learning
Wan, Wei, Ning, Yuxuan, Hu, Shengshan, Xue, Lulu, Li, Minghui, Zhang, Leo Yu, Jin, Hai
\textit{Federated learning} (FL) and \textit{split learning} (SL) are prevailing distributed paradigms in recent years. They both enable shared global model training while keeping data localized on users' devices. The former excels in parallel execution capabilities, while the latter enjoys low dependence on edge computing resources and strong privacy protection. \textit{Split federated learning} (SFL) combines the strengths of both FL and SL, making it one of the most popular distributed architectures. Furthermore, a recent study has claimed that SFL exhibits robustness against poisoning attacks, with a fivefold improvement compared to FL in terms of robustness. In this paper, we present a novel poisoning attack known as MISA. It poisons both the top and bottom models, causing a \textbf{\underline{misa}}lignment in the global model, ultimately leading to a drastic accuracy collapse. This attack unveils the vulnerabilities in SFL, challenging the conventional belief that SFL is robust against poisoning attacks. Extensive experiments demonstrate that our proposed MISA poses a significant threat to the availability of SFL, underscoring the imperative for academia and industry to accord this matter due attention.
Combining Adversaries with Anti-adversaries in Training
Zhou, Xiaoling, Yang, Nan, Wu, Ou
Adversarial training is an effective learning technique to improve the robustness of deep neural networks. In this study, the influence of adversarial training on deep learning models in terms of fairness, robustness, and generalization is theoretically investigated under more general perturbation scope that different samples can have different perturbation directions (the adversarial and anti-adversarial directions) and varied perturbation bounds. Our theoretical explorations suggest that the combination of adversaries and anti-adversaries (samples with anti-adversarial perturbations) in training can be more effective in achieving better fairness between classes and a better tradeoff between robustness and generalization in some typical learning scenarios (e.g., noisy label learning and imbalance learning) compared with standard adversarial training. On the basis of our theoretical findings, a more general learning objective that combines adversaries and anti-adversaries with varied bounds on each training sample is presented. Meta learning is utilized to optimize the combination weights. Experiments on benchmark datasets under different learning scenarios verify our theoretical findings and the effectiveness of the proposed methodology.
- Asia > Middle East > Jordan (0.04)
- Asia > China > Tianjin Province > Tianjin (0.04)
Should Adversarial Attacks Use Pixel p-Norm?
Sen, Ayon, Zhu, Xiaojin, Marshall, Liam, Nowak, Robert
Adversarial attacks aim to confound machine learning systems, while remaining virtually imperceptible to humans. Attacks on image classification systems are typically gauged in terms of $p$-norm distortions in the pixel feature space. We perform a behavioral study, demonstrating that the pixel $p$-norm for any $0\le p \le \infty$, and several alternative measures including earth mover's distance, structural similarity index, and deep net embedding, do not fit human perception. Our result has the potential to improve the understanding of adversarial attack and defense strategies.
- Information Technology > Security & Privacy (0.91)
- Government > Military (0.81)