AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Formal Verification of Local Robustness of a Classification Algorithm for a Spatial Use Case

Longuet, Delphine, Elouazzani, Amira, Riveiros, Alejandro Penacho, Bastianello, Nicola

arXiv.org Artificial IntelligenceNov-19-2025

Failures in satellite components are costly and challenging to address, often requiring significant human and material resources. Embedding a hybrid AI-based system for fault detection directly in the satellite can greatly reduce this burden by allowing earlier detection. However, such systems must operate with extremely high reliability. To ensure this level of dependability, we employ the formal verification tool Marabou to verify the local robustness of the neural network models used in the AI-based algorithm. This tool allows us to quantify how much a model's input can be perturbed before its output behavior becomes unstable, thereby improving trustworthiness with respect to its performance under uncertainty.

artificial intelligence, histogram, machine learning, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.4204/EPTCS.436.4

2509.03948

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection

Wang, Peiran, Liu, Yang, Lu, Yunfei, Cai, Yifeng, Chen, Hongbo, Yang, Qingyou, Zhang, Jie, Hong, Jue, Wu, Ye

arXiv.org Artificial IntelligenceNov-19-2025

Large Language Model (LLM) agents offer a powerful new paradigm for solving various problems by combining natural language reasoning with the execution of external tools. However, their dynamic and non-transparent behavior introduces critical security risks, particularly in the presence of prompt injection attacks. In this work, we propose a novel insight that treats the agent runtime traces as structured programs with analyzable semantics. Thus, we present AgentArmor, a program analysis framework that converts agent traces into graph intermediate representation-based structured program dependency representations (e.g., CFG, DFG, and PDG) and enforces security policies via a type system. AgentArmor consists of three key components: (1) a graph constructor that reconstructs the agent's runtime traces as graph-based intermediate representations with control and data flow described within; (2) a property registry that attaches security-relevant metadata of interacted tools \& data, and (3) a type system that performs static inference and checking over the intermediate representation. By representing agent behavior as structured programs, AgentArmor enables program analysis for sensitive data flow, trust boundaries, and policy violations. We evaluate AgentArmor on the AgentDojo benchmark, the results show that AgentArmor can reduce the ASR to 3\%, with the utility drop only 1\%.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.01249

Genre: Research Report > New Finding (0.65)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Preference Learning with Lie Detectors can Induce Honesty or Evasion

Cundy, Chris, Gleave, Adam

arXiv.org Artificial IntelligenceNov-19-2025

As AI systems become more capable, deceptive behaviors can undermine evaluation and mislead users at deployment. Recent work has shown that lie detectors can accurately classify deceptive behavior, but they are not typically used in the training pipeline due to concerns around contamination and objective hacking. We examine these concerns by incorporating a lie detector into the labelling step of LLM post-training and evaluating whether the learned policy is genuinely more honest, or instead learns to fool the lie detector while remaining deceptive. Using DolusChat, a novel 65k-example dataset with paired truthful/deceptive responses, we identify three key factors that determine the honesty of learned policies: amount of exploration during preference learning, lie detector accuracy, and KL regularization strength. We find that preference learning with lie detectors and GRPO can lead to policies which evade lie detectors, with deception rates of over 85\%. However, if the lie detector true positive rate (TPR) or KL regularization is sufficiently high, GRPO learns honest policies. In contrast, off-policy algorithms (DPO) consistently lead to deception rates under 25\% for realistic TPRs. Our results illustrate a more complex picture than previously assumed: depending on the context, lie-detector-enhanced training can be a powerful tool for scalable oversight, or a counterproductive method encouraging undetectable misalignment.

detector, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2505.13787

Country:

Asia (0.28)
Europe (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material (1.00)

Industry:

Transportation (1.00)
Telecommunications (1.00)
Law (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AudioMarkBench: Benchmarking Robustness of Audio Watermarking

Neural Information Processing SystemsNov-18-2025, 21:43:31 GMT

The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios.

artificial intelligence, machine learning, perturbation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > New Hampshire (0.04)
Europe > France (0.04)
Asia > Taiwan (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Media (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

7baa48bc166aa2013d78cbdc15010530-Paper-Conference.pdf

Neural Information Processing SystemsNov-18-2025, 16:01:28 GMT

dimension, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
(10 more...)

Genre: Research Report > New Finding (0.46)

Industry: Government > Military > Navy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

MMDCP: A Distribution-free Approach to Outlier Detection and Classification with Coverage Guarantees and SCW-FDR Control

Lin, Youwu, Qian, Xiaoyu, Wu, Jinru, Liu, Qi, Wang, Pei

arXiv.org Machine LearningNov-18-2025

We propose the Modified Mahalanobis Distance Conformal Prediction (MMDCP), a unified framework for multi-class classification and outlier detection under label shift, where the training and test distributions may differ. In such settings, many existing methods construct nonconformity scores based on empirical cumulative or density functions combined with data-splitting strategies. However, these approaches are often computationally expensive due to their heavy reliance on resampling procedures and tend to produce overly conservative prediction sets with unstable coverage, especially in small samples. To address these challenges, MMDCP combines class-specific distance measures with full conformal prediction to construct a score function, thereby producing adaptive prediction sets that effectively capture both inlier and outlier structures. Under mild regularity conditions, we establish convergence rates for the resulting sets and provide the first theoretical characterization of the gap between oracle and empirical conformal $p$-values, which ensures valid coverage and effective control of the class-wise false discovery rate (CW-FDR). We further introduce the Summarized Class-Wise FDR (SCW-FDR), a novel global error metric aggregating false discoveries across classes, and show that it can be effectively controlled within the MMDCP framework. Extensive simulations and two real-data applications support our theoretical findings and demonstrate the advantages of the proposed method.

data mining, machine learning, prediction, (16 more...)

arXiv.org Machine Learning

2511.12016

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

A Review of Statistical and Machine Learning Approaches for Coral Bleaching Assessment

Sarkar, Soham, Hazra, Arnab

arXiv.org Machine LearningNov-18-2025

Coral bleaching is a major concern for marine ecosystems; more than half of the world's coral reefs have either bleached or died over the past three decades. Increasing sea surface temperatures, along with various spatiotemporal environmental factors, are considered the primary reasons behind coral bleaching. The statistical and machine learning communities have focused on multiple aspects of the environment in detail. However, the literature on various stochastic modeling approaches for assessing coral bleaching is extremely scarce. Data-driven strategies are crucial for effective reef management, and this review article provides an overview of existing statistical and machine learning methods for assessing coral bleaching. Statistical frameworks, including simple regression models, generalized linear models, generalized additive models, Bayesian regression models, spatiotemporal models, and resilience indicators, such as Fisher's Information and Variance Index, are commonly used to explore how different environmental stressors influence coral bleaching. On the other hand, machine learning methods, including random forests, decision trees, support vector machines, and spatial operators, are more popular for detecting nonlinear relationships, analyzing high-dimensional data, and allowing integration of heterogeneous data from diverse sources. In addition to summarizing these models, we also discuss potential data-driven future research directions, with a focus on constructing statistical and machine learning models in specific contexts related to coral bleaching.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

2511.12234

Country:

North America > United States (0.14)
Indian Ocean > Red Sea (0.04)
Asia > Middle East > Yemen (0.04)
(11 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (0.68)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.87)
(2 more...)

Add feedback

Person-AI Bidirectional Fit - A Proof-Of-Concept Case Study Of Augmented Human-Ai Symbiosis In Management Decision-Making Process

Bieńkowska, Agnieszka, Małecki, Jacek, Mathiesen-Ohman, Alexander, Tworek, Katarzyna

arXiv.org Artificial IntelligenceNov-18-2025

This article develops the concept of Person-AI bidirectional fit, defined as the continuously evolving, context-sensitive alignment-primarily cognitive, but also emotional and behavioral-between a human decision-maker and an artificial intelligence system. Grounded in contingency theory and quality theory, the study examines the role of P-AI fit in managerial decision-making through a proof-of-concept case study involving a real hiring process for a Senior AI Lead. Three decision pathways are compared: (1) independent evaluations by a CEO, CTO, and CSO; (2) an evaluation produced by an augmented human-AI symbiotic intelligence system (H3LIX-LAIZA); and (3) an assessment generated by a general-purpose large language model. The results reveal substantial role-based divergence in human judgments, high alignment between H3LIX-LAIZA and the CEOs implicit decision model-including ethical disqualification of a high-risk candidate and a critical false-positive recommendation from the LLMr. The findings demonstrate that higher P-AI fit, exemplified by the CEO H3LIX-LAIZA relationship, functions as a mechanism linking augmented symbiotic intelligence to accurate, trustworthy, and context-sensitive decisions. The study provides an initial verification of the P-AI fit construct and a proof-of-concept for H3LIX-LAIZA as an augmented human-AI symbiotic intelligence system.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.1367

Country: Europe (0.28)

Genre: Research Report > Experimental Study (0.86)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(2 more...)

Add feedback

AI Fairness Beyond Complete Demographics: Current Achievements and Future Directions

Wang, Zichong, Yin, Zhipeng, Yap, Roland H. C., Zhang, Wenbin

arXiv.org Artificial IntelligenceNov-18-2025

Fairness in artificial intelligence (AI) has become a growing concern due to discriminatory outcomes in AI-based decision-making systems. While various methods have been proposed to mitigate bias, most rely on complete demographic information, an assumption often impractical due to legal constraints and the risk of reinforcing discrimination. This survey examines fairness in AI when demographics are incomplete, addressing the gap between traditional approaches and real-world challenges. We introduce a novel taxonomy of fairness notions in this setting, clarifying their relationships and distinctions. Additionally, we summarize existing techniques that promote fairness beyond complete demographics and highlight open research questions to encourage further progress in the field.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.13525

Country:

North America > United States (0.67)
Asia (0.46)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.48)
Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (0.93)
Law > Civil Rights & Constitutional Law (0.88)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback

A Quantum Tensor Network-Based Viewpoint for Modeling and Analysis of Time Series Data

Vipulananthan, Pragatheeswaran, Premaratne, Kamal, Sarkar, Dilip, Murthi, Manohar N.

arXiv.org Artificial IntelligenceNov-18-2025

Accurate uncertainty quantification is a critical challenge in machine learning. While neural networks are highly versatile and capable of learning complex patterns, they often lack interpretability due to their ``black box'' nature. On the other hand, probabilistic ``white box'' models, though interpretable, often suffer from a significant performance gap when compared to neural networks. To address this, we propose a novel quantum physics-based ``white box'' method that offers both accurate uncertainty quantification and enhanced interpretability. By mapping the kernel mean embedding (KME) of a time series data vector to a reproducing kernel Hilbert space (RKHS), we construct a tensor network-inspired 1D spin chain Hamiltonian, with the KME as one of its eigen-functions or eigen-modes. We then solve the associated Schr{ö}dinger equation and apply perturbation theory to quantify uncertainty, thereby improving the interpretability of tasks performed with the quantum tensor network-based model. We demonstrate the effectiveness of this methodology, compared to state-of-the-art ``white box" models, in change point detection and time series clustering, providing insights into the uncertainties associated with decision-making throughout the process.

artificial intelligence, hamiltonian, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICKG63256.2024.00054

2511.13514

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback