AITopics | adversarially

Collaborating Authors

adversarially

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Predicting the Performance of Black-box Language Models with Follow-up Queries

Neural Information Processing SystemsJun-14-2026, 06:41:37 GMT

Reliably predicting the behavior of language models---such as whether their outputs are correct or have been adversarially manipulated---is a fundamentally challenging task. This is often made even more difficult as frontier language models are offered only through closed-source APIs, providing only black-box access. In this paper, we predict the behavior of black-box language models by asking follow-up questions and taking the probabilities of responses representations to train reliable predictors. We first demonstrate that training a linear model on these responses reliably and accurately predicts model correctness on question-answering and reasoning benchmarks. Surprisingly, this can that operate over model internals or activations. Furthermore, we demonstrate that these follow-up question responses can reliably distinguish between a clean version of an LLM and one that has been adversarially influenced via a system prompt to answer questions incorrectly or to introduce bugs into generated code. Finally, we show that they can also be used to differentiate between black-box LLMs, enabling the detection of misrepresented models provided through an API. Overall, our work shows promise in monitoring black-box language model behavior, supporting their deployment in larger, autonomous systems.

large language model, natural language, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)

Add feedback

AUnified Game-Theoretic Interpretation of Adversarial Robustness: Supplementary Material

Neural Information Processing SystemsApr-25-2026, 01:01:28 GMT

In this section, in order to help readers understand the metric in the paper, we first revisit the definition of the Shapley value [14], which is widely considered as an unbiased estimation of the numerical importance w.r.t. each input variable. In game theory, the complex system is usually represented as a game, where each input variable is taken as a player, and the output of this system is regarded as the total reward of all players. Given a game with multiple players (input variables) N = {1,2,,n}, some players cooperate to pursue a high reward. Thus, the task is to divide the total reward, and fairly assign the divided elementary reward to each individual player. In this way, the elementary reward can be considered as the numerical importance of the corresponding variable to the complex system. Let 2N def= {S|S N}indicate all potential subsets of N. The game v: 2N R is a function, which estimates the overall reward v(S) earned by each specific subset of players S N. In this way, the Shapley value, denoted by φ(i), represents the numerical importance of the player ito the game v. φ(i) = X Using Shapley values to explain DNNs.

artificial intelligence, deep learning, machine learning, (20 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Industry: Leisure & Entertainment > Games (0.54)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

1f4fe6a4411edc2ff625888b4093e917-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 01:01:25 GMT

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.15)

Genre: Research Report (0.68)

Technology:

Information Technology > Game Theory (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Scanning Trojaned Models Using Out-of-Distribution Samples

Neural Information Processing SystemsMar-22-2026, 20:11:43 GMT

Scanning for trojan (backdoor) in deep neural networks is crucial due to their significant real-world applications. There has been an increasing focus on developing effective general trojan scanning methods across various trojan attacks. Despite advancements, there remains a shortage of methods that perform effectively without preconceived assumptions about the backdoor attack method. Additionally, we have observed that current methods struggle to identify classifiers trojaned using adversarial training. Motivated by these challenges, our study introduces a novel scanning method named TRODO (TROjan scanning by Detection of adversarial shifts in Out-of-distribution samples).

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Tolerant Algorithms for Learning with Arbitrary Covariate Shift

Neural Information Processing SystemsMar-22-2026, 17:35:23 GMT

We study the problem of learning under arbitrary distribution shift, where the learner is trained on a labeled set from one distribution but evaluated on a different, potentially adversarially generated test distribution. We focus on two frameworks: [GKKM'20], allowing abstention on adversarially generated parts of the test distribution, and [KSV'23], permitting abstention on the entire test distribution if distribution shift is detected. All prior known algorithms either rely on learning primitives that are computationally hard even for simple function classes, or end up abstaining entirely even in the presence of a tiny amount of distribution shift. We address both these challenges for natural function classes, including intersections of halfspaces and decision trees, and standard training distributions, including Gaussians. For PQ learning, we give efficient learning algorithms, while for TDS learning, our algorithms can tolerate moderate amounts of distribution shift. At the core of our approach is an improved analysis of spectral outlier-removal techniques from learning with nasty noise. Our analysis can (1) handle arbitrarily large fraction of outliers, which is crucial for handling arbitrary distribution shifts, and (2) obtain stronger bounds on polynomial moments of the distribution after outlier removal, yielding new insights into polynomial regression under distribution shifts. Lastly, our techniques lead to novel results for tolerant [RV'23], and learning with nasty noise.

artificial intelligence, distribution shift, machine learning, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Explicit Tradeoffs between Adversarial and Natural Distributional Robustness

Neural Information Processing SystemsFeb-19-2026, 19:59:41 GMT

Results averaged over ResNet18 and ResNet50.

artificial intelligence, machine learning, spurious feature, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland (0.04)
North America > United States > California (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

cfee398643cbc3dc5eefc89334cacdc1-Paper.pdf

Neural Information Processing SystemsFeb-19-2026, 07:23:14 GMT

accuracy, fine-tuning, robust accuracy, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Appendices

Neural Information Processing SystemsFeb-19-2026, 02:44:20 GMT

The supplementary material is organized as follows. We first discuss additional related work and provide experiment details inSection 2andAppendix Brespectively. Adversarial Defenses: Neural networks trained using standard procedures such as SGD are extremely vulnerable [23] to -bound adversarial attacks such as FGSM [23], PGD [42], CW [11], andMomentum [17];Unrestricted attacks [7,19]cansignificantly degrade model performance as well. Defense strategies based on heuristics such as feature squeezing [82], denoising [80], encoding [10], specialized nonlinearities [83] and distillation [56] have had limited success against stronger attacks [2]. Then, we introduce a noisy version of the5-slab block,whichwelateruseinAppendixD.

artificial intelligence, arxivpreprintarxiv, machine learning, (16 more...)

Neural Information Processing Systems

Country: Europe > Italy > Tuscany > Florence (0.04)

Industry: Information Technology > Security & Privacy (0.34)

Technology: