AITopics

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Austria (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Irene Chen, Fredrik D. Johansson, David Sontag

Why Is My Classifier Discriminatory?

Neural Information Processing SystemsFeb-12-2026, 09:41:32 GMT

Recent attempts to achieve fairness in predictive models focus on the balance between fairness and accuracy.

artificial intelligence, discrimination, machine learning, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.69)

Industry: Health & Medicine > Health Care Providers & Services (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Neural Information Processing SystemsNov-18-2025, 21:43:31 GMT

AudioMarkBench: Benchmarking Robustness of Audio Watermarking

The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios.

artificial intelligence, machine learning, perturbation, (15 more...)

Country:

North America > United States > New Hampshire (0.04)
Europe > France (0.04)
Asia > Taiwan (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Media (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Chevalley, Mathieu, Mehrjou, Arash, Schwab, Patrick

Theoretical Guarantees for Causal Discovery on Large Random Graphs

arXiv.org Artificial IntelligenceNov-5-2025

We investigate theoretical guarantees for the false-negative rate (FNR) -- the fraction of true causal edges whose orientation is not recovered, under single-variable random interventions and an $ε$-interventional faithfulness assumption that accommodates latent confounding. For sparse Erdős--Rényi directed acyclic graphs, where the edge probability scales as $p_e = Θ(1/d)$, we show that the FNR concentrates around its mean at rate $O(\frac{\log d}{\sqrt d})$, implying that large deviations above the expected error become exponentially unlikely as dimensionality increases. This concentration ensures that derived upper bounds hold with high probability in large-scale settings. Extending the analysis to generalized Barabási--Albert graphs reveals an even stronger phenomenon: when the degree exponent satisfies $γ> 3$, the deviation width scales as $O(d^{β- \frac{1}{2}})$ with $β= 1/(γ- 1) < \frac{1}{2}$, and hence vanishes in the limit. This demonstrates that realistic scale-free topologies intrinsically regularize causal discovery, reducing variability in orientation error. These finite-dimension results provide the first dimension-adaptive, faithfulness-robust guarantees for causal structure recovery, and challenge the intuition that high dimensionality and network heterogeneity necessarily hinder accurate discovery. Our simulation results corroborate these theoretical predictions, showing that the FNR indeed concentrates and often vanishes in practice as dimensionality grows.

artificial intelligence, intervention, machine learning, (17 more...)

2511.02536

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

arXiv.org Artificial IntelligenceNov-5-2025

A New Perspective on Precision and Recall for Generative Models

Sykes, Benjamin, Simon, Loïc, Rabin, Julien, Fadili, Jalal

With the recent success of generative models in image and text, the question of their evaluation has recently gained a lot of attention. While most methods from the state of the art rely on scalar metrics, the introduction of Precision and Recall (PR) for generative model has opened up a new avenue of research. The associated PR curve allows for a richer analysis, but their estimation poses several challenges. In this paper, we present a new framework for estimating entire PR curves based on a binary classification standpoint. We conduct a thorough statistical analysis of the proposed estimates. As a byproduct, we obtain a minimax upper bound on the PR estimation risk. We also show that our framework extends several landmark PR metrics of the literature which by design are restrained to the extreme values of the curve. Finally, we study the different behaviors of the curves obtained experimentally in various settings.

artificial intelligence, machine learning, precision and recall, (17 more...)

2511.02414

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Neural Information Processing SystemsOct-10-2025, 03:58:15 GMT

AudioMarkBench: Benchmarking Robustness of Audio Watermarking

no-box perturbation, perturbation, watermark-removal perturbation, (13 more...)

Country:

North America > United States > New Hampshire (0.04)
Europe > France (0.04)
Asia > Taiwan (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Media (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Jiang, Kevin, Dobriban, Edgar

Fair Classification by Direct Intervention on Operating Characteristics

arXiv.org Machine LearningOct-1-2025

We develop new classifiers under group fairness in the attribute-aware setting for binary classification with multiple group fairness constraints (e.g., demographic parity (DP), equalized odds (EO), and predictive parity (PP)). We propose a novel approach, applicable to linear fractional constraints, based on directly intervening on the operating characteristics of a pre-trained base classifier, by (i) identifying optimal operating characteristics using the base classifier's group-wise ROC convex hulls and (ii) post-processing the base classifier to match those targets. As practical post-processors, we consider randomizing a mixture of group-wise thresholding rules subject to minimizing the expected number of interventions. We further extend our approach to handle multiple protected attributes and multiple linear fractional constraints. On standard datasets (COMPAS and ACSIncome), our methods simultaneously satisfy approximate DP, EO, and PP with few interventions and a near-oracle drop in accuracy; comparing favorably to previous methods.

classifier, constraint, fairness constraint, (14 more...)

arXiv.org Machine Learning

2509.25481

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > New York (0.04)
North America > United States > Florida > Broward County (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Law (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

arXiv.org Artificial IntelligenceJul-22-2025

PromptArmor: Simple yet Effective Prompt Injection Defenses

Shi, Tianneng, Zhu, Kaijie, Wang, Zhun, Jia, Yuqi, Cai, Will, Liang, Weida, Wang, Haonan, Alzahrani, Hend, Lu, Joshua, Kawaguchi, Kenji, Alomair, Basel, Zhao, Xuandong, Wang, William Yang, Gong, Neil, Guo, Wenbo, Song, Dawn

Despite their potential, recent research has demonstrated that LLM agents are vulnerable to prompt injection attacks, where malicious prompts are injected into the agent's input, causing it to perform an attacker-specified task rather than the intended task provided by the user. In this paper, we present PromptArmor, a simple yet effective defense against prompt injection attacks. Specifically, PromptArmor prompts an off-the-shelf LLM to detect and remove potential injected prompts from the input before the agent processes it. Our results show that PromptArmor can accurately identify and remove injected prompts. For example, using GPT-4o, GPT-4.1, or o4-mini, PromptArmor achieves both a false positive rate and a false negative rate below 1% on the AgentDojo benchmark. Moreover, after removing injected prompts with PromptArmor, the attack success rate drops to below 1%. We also demonstrate PromptArmor's effectiveness against adaptive attacks and explore different strategies for prompting an LLM. We recommend that PromptArmor be adopted as a standard baseline for evaluating new defenses against prompt injection attacks.

large language model, machine learning, natural language, (17 more...)

2507.15219

Country: Asia > Thailand (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Piot, Paloma, Martín-Rodilla, Patricia, Parapar, Javier

Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models

arXiv.org Artificial IntelligenceMay-6-2025

Commercial Large Language Models (LLMs) have recently incorporated memory features to deliver personalised responses. This memory retains details such as user demographics and individual characteristics, allowing LLMs to adjust their behaviour based on personal information. However, the impact of integrating personalised information into the context has not been thoroughly assessed, leading to questions about its influence on LLM behaviour. Personalisation can be challenging, particularly with sensitive topics. In this paper, we examine various state-of-the-art LLMs to understand their behaviour in different personalisation scenarios, specifically focusing on hate speech. We prompt the models to assume country-specific personas and use different languages for hate speech detection. Our findings reveal that context personalisation significantly influences LLMs' responses in this sensitive area. To mitigate these unwanted biases, we fine-tune the LLMs by penalising inconsistent hate speech classifications made with and without country or language-specific context. The refined models demonstrate improved performance in both personalised contexts and when no context is provided.

large language model, llama 3, machine learning, (19 more...)

2505.02252

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.66)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Eiras, Francisco, Zemour, Eliott, Lin, Eric, Mugunthan, Vaikkunth

Know Thy Judge: On the Robustness Meta-Evaluation of LLM Safety Judges

arXiv.org Artificial IntelligenceMar-6-2025

Large Language Model (LLM) based judges form the underpinnings of key safety evaluation processes such as offline benchmarking, automated red-teaming, and online guardrailing. This widespread requirement raises the crucial question: can we trust the evaluations of these evaluators? In this paper, we highlight two critical challenges that are typically overlooked: (i) evaluations in the wild where factors like prompt sensitivity and distribution shifts can affect performance and (ii) adversarial attacks that target the judge. We highlight the importance of these through a study of commonly used safety judges, showing that small changes such as the style of the model output can lead to jumps of up to 0.24 in the false negative rate on the same dataset, whereas adversarial attacks on the model generation can fool some judges into misclassifying 100% of harmful generations as safe ones. These findings reveal gaps in commonly used meta-evaluation benchmarks and weaknesses in the robustness of current LLM judges, indicating that low attack success under certain judges could create a false sense of security. Well-known jailbreak attacks on widely used Large Language Models (LLMs) such as ChatGPT have raised concerns about the robustness of these systems to safety violations. As a result, organizations deploying them typically rely on a two-pronged approach to safety: 1) offline benchmarking and red-teaming (Mazeika et al., 2024; Perez et al., 2022; Ganguli et al., 2022), and 2) online guardrails designed to minimize the risk from attacks (Mu et al., 2024; Manczak et al., 2024; Neill et al., 2024).

better workshop, dataset, evaluation, (15 more...)

2503.04474

Country: North America > United States > Florida > Miami-Dade County > Miami (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)