AITopics | adversarial manipulation

Collaborating Authors

adversarial manipulation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation

Neural Information Processing SystemsNov-21-2025, 16:10:51 GMT

Recent work has shown that state-of-the-art classifiers are quite brittle, in the sense that a small adversarial change of an originally with high confidence correctly classified input leads to a wrong classification again with high confidence. This raises concerns that such classifiers are vulnerable to attacks and calls into question their usage in safety-critical systems. We show in this paper for the first time formal guarantees on the robustness of a classifier by giving instance-specific \emph{lower bounds} on the norm of the input manipulation required to change the classifier decision. Based on this analysis we propose the Cross-Lipschitz regularization functional. We show that using this form of regularization in kernel methods resp.

adversarial manipulation, formal guarantee, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adversarial Manipulation of Reasoning Models using Internal Representations

Yamaguchi, Kureha, Etheridge, Benjamin, Arditi, Andy

arXiv.org Artificial IntelligenceAug-29-2025

Reasoning models generate chain-of-thought (CoT) tokens before their final output, but how this affects their vulnerability to jailbreak attacks remains unclear. While traditional language models make refusal decisions at the prompt-response boundary, we find evidence that DeepSeek-R1-Distill-Llama-8B makes these decisions within its CoT generation. We identify a linear direction in activation space during CoT token generation that predicts whether the model will refuse or comply -- termed the "caution" direction because it corresponds to cautious reasoning patterns in the generated text. Ablating this direction from model activations increases harmful compliance, effectively jailbreaking the model. We additionally show that intervening only on CoT token activations suffices to control final outputs, and that incorporating this direction into prompt-based attacks improves success rates. Our findings suggest that the chain-of-thought itself is a promising new target for adversarial manipulation in reasoning models. Code available at https://github.com/ky295/reasoning-manipulation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.03167

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.86)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.94)
Health & Medicine > Therapeutic Area > Immunology (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Reviews: Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation

Neural Information Processing SystemsOct-8-2024, 11:29:34 GMT

This paper fills an important gap in the literature of robustness of classifiers to adversarial examples by proposing the first (to the best of my knowledge) formal guarantee (at an example level) on the robustness of a given classifier to adversarial examples. Unsurprisingly, the bound involves the Lipschitz constant of the Jacobians which the authors exploit to propose a cross-Lipschitz regularization. Overall the paper is well written, and the material is well presented. The proof of Theorem 2.1 is correct. I did not check the proofs of the propositions 2.1 and 4.1. This is an interesting work.

adversarial manipulation, formal guarantee, robustness, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

Add feedback

Assessing Neural Network Robustness via Adversarial Pivotal Tuning

Christensen, Peter Ebert, Snæbjarnarson, Vésteinn, Dittadi, Andrea, Belongie, Serge, Benaim, Sagie

arXiv.org Artificial IntelligenceNov-17-2022

The ability to assess the robustness of image classifiers to a diverse set of manipulations is essential to their deployment in the real world. Recently, semantic manipulations of real images have been considered for this purpose, as they may not arise using standard adversarial settings. However, such semantic manipulations are often limited to style, color or attribute changes. While expressive, these manipulations do not consider the full capacity of a pretrained generator to affect adversarial image manipulations. In this work, we aim at leveraging the full capacity of a pretrained image generator to generate highly detailed, diverse and photorealistic image manipulations. Inspired by recent GAN-based image inversion methods, we propose a method called Adversarial Pivotal Tuning (APT). APT first finds a pivot latent space input to a pretrained generator that best reconstructs an input image. It then adjusts the weights of the generator to create small, but semantic, manipulations which fool a pretrained classifier. Crucially, APT changes both the input and the weights of the pretrained generator, while preserving its expressive latent editing capability, thus allowing the use of its full capacity in creating semantic adversarial manipulations. We demonstrate that APT generates a variety of semantic image manipulations, which preserve the input image class, but which fool a variety of pretrained classifiers. We further demonstrate that classifiers trained to be robust to other robustness benchmarks, are not robust to our generated manipulations and propose an approach to improve the robustness towards our generated manipulations. Code available at: https://captaine.github.io/apt/

artificial intelligence, machine learning, manipulation, (16 more...)

arXiv.org Artificial Intelligence

2211.09782

Country:

North America > United States (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Artificial Intelligence: Too Fragile to Fight?

#artificialintelligenceFeb-1-2022, 08:05:04 GMT

You can become utterly dependent on a new glamorous technology, be it cyber-space, artificial intelligence. . . But does it create a potential achilles heel? Artificial intelligence (AI) has become the technical focal point for advancing naval and Department of Defense (DoD) capabilities. Secretary of the Navy Carlos Del Toro listed AI first among his priorities for innovating U.S. naval forces. Chief of Naval Operations Admiral Michael Gilday listed it as his top priority during his Senate confirmation hearing.2

ai system, application, automation, (14 more...)

#artificialintelligence

Country:

North America > United States (1.00)
Europe > France (0.04)
Asia > Middle East > Iran > Hormozgan Province > Bandar Abbas (0.04)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation

Hein, Matthias, Andriushchenko, Maksym

Neural Information Processing SystemsFeb-14-2020, 09:56:10 GMT

adversarial manipulation, classifier, formal guarantee, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Army scientists train machine learning models to wrangle dirty data

#artificialintelligenceJan-19-2020, 19:46:43 GMT

Army researchers have developed a new approach for training machine learning models that can better withstand dirty and deceptive data. Models trained under this method have greatly surpassed other state-of-the-art models in terms of robustness, scientists said. Machines outperform humans in many data-processing tasks, but sometimes fall victim to obvious mistakes that humans can see a mile away. Scientists at the U.S. Army Combat Capabilities Development Command's Army Research Laboratory designed a new approach that makes it harder for adversaries to trick machine learning models. "We were able to reduce model complexity by about a factor of 10 without affecting other performance metrics under benign conditions," said Army scientist Dr. Ananthram Swami.

army scientist train machine, digit, new approach, (12 more...)

#artificialintelligence

Country:

North America > United States > Maryland > Prince George's County > Adelphi (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)

Industry:

Government > Military > Army (1.00)
Government > Regional Government > North America Government > United States Government (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals

Huang, Yunhan, Zhu, Quanyan

arXiv.org Artificial IntelligenceJun-24-2019

This paper studies reinforcement learning (RL) under malicious falsification on cost signals and introduces a quantitative framework of attack models to understand the vulnerabilities of RL. Focusing on $Q$-learning, we show that $Q$-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. We characterize the relation between the falsified cost and the $Q$-factors as well as the policy learned by the learning agent which provides fundamental limits for feasible offensive and defensive moves. We propose a robust region in terms of the cost within which the adversary can never achieve the targeted policy. We provide conditions on the falsified cost which can mislead the agent to learn an adversary's favored policy. A numerical case study of water reservoir control is provided to show the potential hazards of RL in learning-based control systems and corroborate the results.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1906.10571

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.70)

Industry:

Energy (0.93)
Information Technology > Security & Privacy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Fundamental Performance Limitation for Adversarial Classification

Makdah, Abed AlRahman Al, Katewa, Vaibhav, Pasqualetti, Fabio

arXiv.org Machine LearningMar-14-2019

Despite the widespread use of machine learning algorithms to solve problems of technological, economic, and social relevance, provable guarantees on the performance of these data-driven algorithms are critically lacking, especially when the data originates from unreliable sources and is transmitted over unprotected and easily accessible channels. In this paper we take an important step to bridge this gap and formally show that, in a quest to optimize their accuracy, binary classification algorithms -- including those based on machine-learning techniques -- inevitably become more sensitive to adversarial manipulation of the data. Further, for a given class of algorithms with the same complexity (i.e., number of classification boundaries), the fundamental tradeoff curve between accuracy and sensitivity depends solely on the statistics of the data, and cannot be improved by tuning the algorithm.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

1903.01032

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions

Nguyen, Quynh, Mukkamala, Mahesh, Hein, Matthias

arXiv.org Machine LearningFeb-28-2018

In the recent literature the important role of depth in deep learning has been emphasized. In this paper we argue that sufficient width of a feedforward network is equally important by answering the simple question under which conditions the decision regions of a neural network are connected. It turns out that for a class of activation functions including leaky ReLU, neural networks having a pyramidal structure, that is no layer has more hidden units than the input dimension, produce necessarily connected decision regions. This implies that a sufficiently wide layer is necessary to produce disconnected decision regions. We discuss the implications of this result for the construction of neural networks, in particular the relation to the problem of adversarial manipulation of classifiers.

artificial intelligence, decision region, machine learning, (16 more...)

arXiv.org Machine Learning

1803.00094

Country:

Europe > Germany > Saarland (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback