AITopics | semantic robustness

Collaborating Authors

semantic robustness

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Guarding the Meaning: Self-Supervised Training for Semantic Robustness in Guard Models

Pinneri, Cristina, Louizos, Christos

arXiv.org Artificial IntelligenceNov-17-2025

Guard models are a critical component of LLM safety, but their sensitivity to superficial linguistic variations remains a key vulnerability. We show that even meaning-preserving paraphrases can cause large fluctuations in safety scores, revealing a lack of semantic grounding. To address this, we introduce a practical, self-supervised framework for improving the semantic robustness of guard models. Our method leverages paraphrase sets to enforce prediction consistency using a novel, skew-aware aggregation strategy for robust target computation. Notably, we find that standard aggregation methods like mean and median can degrade safety, underscoring the need for skew-aware alternatives. We analyze six open-source guard models and show that our approach reduces semantic variability across paraphrases by ~58%, improves benchmark accuracy by ~2.5% on average, and generalizes to unseen stylistic variations. Intriguingly, we discover a bidirectional relationship between model calibration and consistency: our robustness training improves calibration by up to 40%, revealing a fundamental connection between these properties. These results highlight the value of treating semantic consistency as a first-class training objective and provide a scalable recipe for building more reliable guard models.

calibration, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.10665

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Monitoring Robustness and Individual Fairness

Gupta, Ashutosh, Henzinger, Thomas A., Kueffner, Konstantin, Mallik, Kaushik, Pape, David

arXiv.org Artificial IntelligenceJun-3-2025

Input-output robustness appears in various different forms in the literature, such as robustness of AI models to adversarial or semantic perturbations and individual fairness of AI models that make decisions about humans. We propose runtime monitoring of input-output robustness of deployed, black-box AI models, where the goal is to design monitors that would observe one long execution sequence of the model, and would raise an alarm whenever it is detected that two similar inputs from the past led to dissimilar outputs. This way, monitoring will complement existing offline ``robustification'' approaches to increase the trustworthiness of AI decision-makers. We show that the monitoring problem can be cast as the fixed-radius nearest neighbor (FRNN) search problem, which, despite being well-studied, lacks suitable online solutions. We present our tool Clemont, which offers a number of lightweight monitors, some of which use upgraded online variants of existing FRNN algorithms, and one uses a novel algorithm based on binary decision diagrams -- a data-structure commonly used in software and hardware verification. We have also developed an efficient parallelization technique that can substantially cut down the computation time of monitors for which the distance between input-output pairs is measured using the $L_\infty$ norm. Using standard benchmarks from the literature of adversarial and semantic robustness and individual fairness, we perform a comparative study of different monitors in \tool, and demonstrate their effectiveness in correctly detecting robustness violations at runtime.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3711896.3737054

2506.00496

Country:

Europe (0.93)
North America > United States > New York (0.28)

Genre: Research Report (0.82)

Industry: Information Technology (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The King is Naked: on the Notion of Robustness for Natural Language Processing

La Malfa, Emanuele, Kwiatkowska, Marta

arXiv.org Artificial IntelligenceJan-11-2022

There is growing evidence that the classical notion of adversarial robustness originally introduced for images has been adopted as a de facto standard by a large part of the NLP research community. We show that this notion is problematic in the context of NLP as it considers a narrow spectrum of linguistic phenomena. In this paper, we argue for semantic robustness, which is better aligned with the human concept of linguistic fidelity. We characterize semantic robustness in terms of biases that it is expected to induce in a model. We study semantic robustness of a range of vanilla and robustly trained architectures using a template-based generative test bed. We complement the analysis with empirical evidence that, despite being harder to implement, semantic robustness can improve performance %gives guarantees for on complex linguistic phenomena where models robust in the classical sense fail.

linguistic phenomenon, robustness, semantic robustness, (16 more...)

arXiv.org Artificial Intelligence

2112.07605

Country:

North America > United States > Hawaii (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Media > Film (0.68)
Leisure & Entertainment (0.47)
Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Semantic Robustness of Models of Source Code

Ramakrishnan, Goutham, Henkel, Jordan, Wang, Zi, Albarghouthi, Aws, Jha, Somesh, Reps, Thomas

arXiv.org Machine LearningFeb-7-2020

Deep neural networks are vulnerable to adversarial examples - small input perturbations that result in incorrect predictions. We study this problem in the context of models of source code, where we want the network to be robust to source-code modifications that preserve code functionality. We define a natural notion of robustness, $k$-transformation robustness, in which an adversary performs up to $k$ semantics-preserving transformations to an input program. We show how to train robust models using an adversarial training objective inspired by that of Madry et al. (2018) for continuous domains. We implement an extensible framework for adversarial training over source code, and conduct a thorough evaluation on a number of datasets and two different architectures. Our results show (1) the increase in robustness following adversarial training, (2) the ability of training on weak adversaries to provide robustness to attacks by stronger adversaries, and (3) the shift in attribution focus of adversarially trained models towards semantic vs. syntactic features.

adversary, dataset, transformation, (15 more...)

arXiv.org Machine Learning

2002.03043

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback