Goto

Collaborating Authors

 imperceptibility


AdvAD: Exploring Non-Parametric Diffusion for Imperceptible Adversarial Attacks

Neural Information Processing Systems

Imperceptible adversarial attacks aim to fool DNNs by adding imperceptible perturbation to the input data. Previous methods typically improve the imperceptibility of attacks by integrating common attack paradigms with specifically designed perception-based losses or the capabilities of generative models. In this paper, we propose Adversarial Attacks in Diffusion (AdvAD), a novel modeling framework distinct from existing attack paradigms.


DASH: A Meta-Attack Framework for Synthesizing Effective and Stealthy Adversarial Examples

Nafi, Abdullah Al Nomaan, Rahaman, Habibur, Haider, Zafaryab, Mahfuz, Tanzim, Suya, Fnu, Bhunia, Swarup, Chakraborty, Prabuddha

arXiv.org Artificial Intelligence

Numerous techniques have been proposed for generating adversarial examples in white-box settings under strict Lp-norm constraints. However, such norm-bounded examples often fail to align well with human perception, and only recently have a few methods begun specifically exploring perceptually aligned adversarial examples. Moreover, it remains unclear whether insights from Lp-constrained attacks can be effectively leveraged to improve perceptual efficacy. In this paper, we introduce DAASH, a fully differentiable meta-attack framework that generates effective and perceptually aligned adversarial examples by strategically composing existing Lp-based attack methods. DAASH operates in a multi-stage fashion: at each stage, it aggregates candidate adversarial examples from multiple base attacks using learned, adaptive weights and propagates the result to the next stage. A novel meta-loss function guides this process by jointly minimizing misclassification loss and perceptual distortion, enabling the framework to dynamically modulate the contribution of each base attack throughout the stages. We evaluate DAASH on adversarially trained models across CIFAR-10, CIFAR-100, and ImageNet. Despite relying solely on Lp-constrained based methods, DAASH significantly outperforms state-of-the-art perceptual attacks such as AdvAD -- achieving higher attack success rates (e.g., 20.63\% improvement) and superior visual quality, as measured by SSIM, LPIPS, and FID (improvements $\approx$ of 11, 0.015, and 5.7, respectively). Furthermore, DAASH generalizes well to unseen defenses, making it a practical and strong baseline for evaluating robustness without requiring handcrafted adaptive attacks for each new defense.


TabAttackBench: A Benchmark for Adversarial Attacks on Tabular Data

He, Zhipeng, Ouyang, Chun, Wen, Lijie, Liu, Cong, Moreira, Catarina

arXiv.org Artificial Intelligence

However, with these advancements comes increasing concern about the robustness and security of models, particularly in the context of adversarial attacks. Adversarial attacks involve the intentional manipulation of input data to deceive machine learning models, causing incorrect or misleading outputs (Szegedy et al., 2014). This area of research has drawn significant attention as researchers strive to understand and mitigate the vulnerabilities in various types of data and models. Adversarial perturbations to images involve pixel intensity modifications (Weng et al., 2024), spatial transformations (Aydin & Temizel, 2023), texture perturbations (Geirhos et al., 2018), and localised patches (Wang et al., 2025) that cause dramatic misclassifications while remaining visually imperceptible in Computer Vision (CV). Similarly, in Natural Language Processing (Zhang et al., 2020), attacks typically involve word substitutions (Yang et al., 2023), character-level modifications (Rocamora et al., 2024), or syntactic transformations (Asl et al., 2024) that preserve semantic meaning while fooling text classifiers (Gao et al., 2024). Adversarial vulnerabilities have also been demonstrated in audio processing (Noureddine et al., 2023) through amplitude modifications (Ko et al., 2023), frequency perturbations (Abdullah et al., 2019), and psychoacoustic masking (Qin et al., 2019) that cause speech recognition systems to misinterpret commands. By addressing the vulnerabilities in these types of data, researchers aim to develop more robust and secure machine learning systems across various domains.


Crafting Imperceptible On-Manifold Adversarial Attacks for Tabular Data

He, Zhipeng, Stevens, Alexander, Ouyang, Chun, De Smedt, Johannes, Barros, Alistair, Moreira, Catarina

arXiv.org Artificial Intelligence

Adversarial attacks on tabular data present unique challenges due to the heterogeneous nature of mixed categorical and numerical features. Unlike images where pixel perturbations maintain visual similarity, tabular data lacks intuitive similarity metrics, making it difficult to define imperceptible modifications. Additionally, traditional gradient-based methods prioritise $\ell_p$-norm constraints, often producing adversarial examples that deviate from the original data distributions. To address this, we propose a latent-space perturbation framework using a mixed-input Variational Autoencoder (VAE) to generate statistically consistent adversarial examples. The proposed VAE integrates categorical embeddings and numerical features into a unified latent manifold, enabling perturbations that preserve statistical consistency. We introduce In-Distribution Success Rate (IDSR) to jointly evaluate attack effectiveness and distributional alignment. Evaluation across six publicly available datasets and three model architectures demonstrates that our method achieves substantially lower outlier rates and more consistent performance compared to traditional input-space attacks and other VAE-based methods adapted from image domain approaches, achieving substantially lower outlier rates and higher IDSR across six datasets and three model architectures. Our comprehensive analyses of hyperparameter sensitivity, sparsity control, and generative architecture demonstrate that the effectiveness of VAE-based attacks depends strongly on reconstruction quality and the availability of sufficient training data. When these conditions are met, the proposed framework achieves superior practical utility and stability compared with input-space methods. This work underscores the importance of maintaining on-manifold perturbations for generating realistic and robust adversarial examples in tabular domains.



SEBA: Sample-Efficient Black-Box Attacks on Visual Reinforcement Learning

Huang, Tairan, Jin, Yulin, Liu, Junxu, Ye, Qingqing, Hu, Haibo

arXiv.org Artificial Intelligence

Visual reinforcement learning has achieved remarkable progress in visual control and robotics, but its vulnerability to adversarial perturbations remains underexplored. Most existing black-box attacks focus on vector-based or discrete-action RL, and their effectiveness on image-based continuous control is limited by the large action space and excessive environment queries. W e propose SEBA, a sample-efficient framework for black-box adversarial attacks on visual RL agents. SEBA integrates a shadow Q model that estimates cumulative rewards under adversarial conditions, a generative adversarial network that produces visually imperceptible perturbations, and a world model that simulates environment dynamics to reduce real-world queries. Through a two-stage iterative training procedure that alternates between learning the shadow model and refining the generator, SEBA achieves strong attack performance while maintaining efficiency. Experiments on MuJoCo and Atari benchmarks show that SEBA significantly reduces cumulative rewards, preserves visual fidelity, and greatly decreases environment interactions compared to prior black-box and white-box methods. Our code is provided in the supplementary material.


Retrieval-Augmented Review Generation for Poisoning Recommender Systems

Yang, Shiyi, Li, Xinshu, Zhou, Guanglin, Wang, Chen, Xu, Xiwei, Zhu, Liming, Yao, Lina

arXiv.org Artificial Intelligence

Abstract--Recent studies have shown that recommender systems (RSs) are highly vulnerable to data poisoning attacks, where malicious actors inject fake user profiles, including a group of well-designed fake ratings, to manipulate recommendations. Due to security and privacy constraints in practice, attackers typically possess limited knowledge of the victim system and thus need to craft profiles that have transferability across black-box RSs. T o maximize the attack impact, the profiles often remains imperceptible. However, generating such high-quality profiles with the restricted resources is challenging. Some works suggest incorporating fake textual reviews to strengthen the profiles; yet, the poor quality of the reviews largely undermines the attack effectiveness and imperceptibility under the practical setting. T o tackle the above challenges, in this paper, we propose to enhance the quality of the review text by harnessing in-context learning (ICL) capabilities of multimodal foundation models. T o this end, we introduce a demonstration retrieval algorithm and a text style transfer strategy to augment the navie ICL. Specifically, we propose a novel practical attack framework named RAGAN to generate high-quality fake user profiles, which can gain insights into the robustness of RSs. The profiles are generated by a jailbreaker and collaboratively optimized on an instructional agent and a guardian to improve the attack transferability and imperceptibility. Comprehensive experiments on various real-world datasets demonstrate that RAGAN achieves the state-of-the-art poisoning attack performance. Impact Statement--Recommender systems play a vital role across e-commerce, online content, and social media platforms, benefiting both users and businesses through personalized suggestions and improved engagement. These advantages also create incentives for malicious actors to exploit them. Recent studies reveal that modern recommender systems are vulnerable to data poisoning attacks, leading to unfair competition and loss of user trust. However, existing attack methods often have limited practicality, overestimating system robustness under real-world constraints.


Uncolorable Examples: Preventing Unauthorized AI Colorization via Perception-Aware Chroma-Restrictive Perturbation

Nii, Yuki, Waseda, Futa, Chang, Ching-Chun, Echizen, Isao

arXiv.org Artificial Intelligence

AI-based colorization has shown remarkable capability in generating realistic color images from grayscale inputs. However, it poses risks of copyright infringement -- for example, the unauthorized colorization and resale of monochrome manga and films. Despite these concerns, no effective method currently exists to prevent such misuse. To address this, we introduce the first defensive paradigm, Uncolorable Examples, which embed imperceptible perturbations into grayscale images to invalidate unauthorized colorization. To ensure real-world applicability, we establish four criteria: effectiveness, imperceptibility, transferability, and robustness. Our method, Perception-Aware Chroma-Restrictive Perturbation (PAChroma), generates Uncolorable Examples that meet these four criteria by optimizing imperceptible perturbations with a Laplacian filter to preserve perceptual quality, and applying diverse input transformations during optimization to enhance transferability across models and robustness against common post-processing (e.g., compression). Experiments on ImageNet and Danbooru datasets demonstrate that PAChroma effectively degrades colorization quality while maintaining the visual appearance. This work marks the first step toward protecting visual content from illegitimate AI colorization, paving the way for copyright-aware defenses in generative media.



Localizing Adversarial Attacks To Produces More Imperceptible Noise

Reddy, Pavan, Gujral, Aditya Sanjay

arXiv.org Artificial Intelligence

Adversarial attacks in machine learning traditionally focus on global perturbations to input data, yet the potential of localized adversarial noise remains underex-plored. This study systematically evaluates localized adversarial attacks across widely-used methods, including FGSM, PGD, and C&W, to quantify their effectiveness, imperceptibility, and computational efficiency. By introducing a binary mask to constrain noise to specific regions, localized attacks achieve significantly lower mean pixel perturbations, higher Peak Signal-to-Noise Ratios (PSNR), and improved Structural Similarity Index (SSIM) compared to global attacks. However, these benefits come at the cost of increased computational effort and a modest reduction in Attack Success Rate (ASR). Our results highlight that iterative methods, such as PGD and C&W, are more robust to localization constraints than single-step methods like FGSM, maintaining higher ASR and imperceptibility metrics. This work provides a comprehensive analysis of localized adversarial attacks, offering practical insights for advancing attack strategies and designing robust defensive systems.