AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

RedHerring Attack: Testing the Reliability of Attack Detection

arXiv.org Artificial IntelligenceSep-26-2025

In response to adversarial text attacks, attack detection models have been proposed and shown to successfully identify text modified by adversaries. Attack detection models can be leveraged to provide an additional check for NLP models and give signals for human input. However, the reliability of these models has not yet been thoroughly explored. Thus, we propose and test a novel attack setting and attack, RedHerring. RedHerring aims to make attack detection models unreliable by modifying a text to cause the detection model to predict an attack, while keeping the classifier correct. This creates a tension between the classifier and detector. If a human sees that the detector is giving an ``incorrect'' prediction, but the classifier a correct one, then the human will see the detector as unreliable. We test this novel threat model on 4 datasets against 3 detectors defending 4 classifiers. We find that RedHerring is able to drop detection accuracy between 20 - 71 points, while maintaining (or improving) classifier accuracy. As an initial defense, we propose a simple confidence check which requires no retraining of the classifier or detector and increases detection accuracy greatly. This novel threat model offers new insights into how adversaries may target detection models.

classifier, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.20691

Country:

Europe (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Philosophy-informed Machine Learning

Naser, MZ

arXiv.org Artificial IntelligenceSep-26-2025

A deep dive into the open literature shows that there are t hree fundamental limitations to current ML approaches, namely blackbox brittleness (which renders models uninterpretable and unreliable under distribution shift [2]), causal blindness (which conflates correlation with causation [3]), and alignment failures (which produce systems optimizing objectives misaligned with human values [4]) . These deficiencies stem from a profound philosophical poverty in how ML conceptualizes knowledge, reasoning, and values. The first fundamental limitation, b lackbox brittleness, manifests when trained models fail on seemingly trivial variations of their training distribution. For example, a vision model that accurately identifies stop signs under normal conditions might misclassify them entirely when small adversarial perturbations are applied [5] . Not surprisingly, t h e same brittleness extends beyond adversarial examples to everyday distribution shifts (e.g., natural language processing models exhibit performance degradation when processing text from different cultural contexts, etc.) [6] .

constraint, logic & formal reasoning, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2509.2037

Country: North America > United States (0.67)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Law (0.67)
Health & Medicine > Therapeutic Area (0.33)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(5 more...)

Add feedback

21b5883bc8fec922fdbbb06675388164-Paper-Conference.pdf

Neural Information Processing SystemsSep-25-2025, 10:33:19 GMT

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe (0.93)
North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (0.70)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Statistical Inference Leveraging Synthetic Data with Distribution-Free Guarantees

Bashari, Meshi, Lee, Yonghoon, Lotan, Roy Maor, Dobriban, Edgar, Romano, Yaniv

arXiv.org Machine LearningSep-25-2025

The rapid proliferation of high-quality synthetic data -- generated by advanced AI models or collected as auxiliary data from related tasks -- presents both opportunities and challenges for statistical inference. This paper introduces a GEneral Synthetic-Powered Inference (GESPI) framework that wraps around any statistical inference procedure to safely enhance sample efficiency by combining synthetic and real data. Our framework leverages high-quality synthetic data to boost statistical power, yet adaptively defaults to the standard inference method using only real data when synthetic data is of low quality. The error of our method remains below a user-specified bound without any distributional assumptions on the synthetic data, and decreases as the quality of the synthetic data improves. This flexibility enables seamless integration with conformal prediction, risk control, hypothesis testing, and multiple testing procedures, all without modifying the base inference method. We demonstrate the benefits of our method on challenging tasks with limited labeled data, including AlphaFold protein structure prediction, and comparing large reasoning models on complex math problems.

alg, prediction, synthetic data, (16 more...)

arXiv.org Machine Learning

2509.20345

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

MAGIC: Multi-task Gaussian process for joint imputation and classification in healthcare time series

Ku, Dohyun, Chong, Catherine D., Berisha, Visar, Schwedt, Todd J., Li, Jing

arXiv.org Machine LearningSep-25-2025

Time series analysis has emerged as an important tool for improving patient diagnosis and management in healthcare applications. However, these applications commonly face two critical challenges: time misalignment and data sparsity. Traditional approaches address these issues through a two-step process of imputation followed by prediction. We propose MAGIC (Multi-tAsk Gaussian Process for Imputation and Classification), a novel unified framework that simultaneously performs class-informed missing value imputation and label prediction within a hierarchical multi-task Gaussian process coupled with functional logistic regression. To handle intractable likelihood components, MAGIC employs Taylor expansion approximations with bounded error analysis, and parameter estimation is performed using EM algorithm with block coordinate optimization supported by convergence analysis. We validate MAGIC through two healthcare applications: prediction of post-traumatic headache improvement following mild traumatic brain injury and prediction of in-hospital mortality within 48 hours after ICU admission. In both applications, MAGIC achieves superior predictive accuracy compared to existing methods. The ability to generate real-time and accurate predictions with limited samples facilitates early clinical assessment and treatment planning, enabling healthcare providers to make more informed treatment decisions.

magic, prediction, time sery, (14 more...)

arXiv.org Machine Learning

2509.19577

Country:

North America > United States > Arizona (0.04)
Europe > Romania (0.04)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)
(2 more...)

Add feedback

Hybrid Pipeline SWD Detection in Long-Term EEG Signals

Rincon, Antonio Quintero, Masino, Nicolas, Marsico, Veronica, Batatia, Hadj

arXiv.org Machine LearningSep-25-2025

Spike-and-wave discharges (SWDs) are the electroencephalographic hallmark of absence epilepsy, yet their manual identification in multi-day recordings remains labour-intensive and error-prone. We present a lightweight hybrid pipeline that couples analytical features with a shallow artificial neural network (ANN) for accurate, patient-specific SWD detection in long-term, monopolar EEG. A two-sided moving-average (MA) filter first suppresses the high-frequency components of normal background activity. The residual signal is then summarised by the mean and the standard deviation of its normally distributed samples, yielding a compact, two-dimensional feature vector for every 20s window. These features are fed to a single-hidden-layer ANN trained via back-propagation to classify each window as SWD or non-SWD. The method was evaluated on 780 channels sampled at 256 Hz from 12 patients, comprising 392 annotated SWD events. It correctly detected 384 events (sensitivity: 98%) while achieving a specificity of 96.2 % and an overall accuracy of 97.2%. Because feature extraction is analytic, and the classifier is small, the pipeline runs in real-time and requires no manual threshold tuning. These results indicate that normal-distribution descriptors combined with a modest ANN provide an effective and computationally inexpensive solution for automated SWD screening in extended EEG recordings.

detection, discharge, neural network, (14 more...)

arXiv.org Machine Learning

doi: 10.1007/978-3-032-06401-1_90

2509.19387

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
North America > United States > District of Columbia > Washington (0.04)
Asia > Middle East > UAE > Dubai Emirate > Dubai (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

PGCLODA: Prompt-Guided Graph Contrastive Learning for Oligopeptide-Infectious Disease Association Prediction

Tan, Dayu, Chen, Jing, Zhou, Xiaoping, Su, Yansen, Zheng, Chunhou

arXiv.org Artificial IntelligenceSep-25-2025

Infectious diseases continue to pose a serious threat to public health, underscoring the urgent need for effective computational approaches to screen novel anti-infective agents. Oligopeptides have emerged as promising candidates in antimicrobial research due to their structural simplicity, high bioavailability, and low susceptibility to resistance. Despite their potential, computational models specifically designed to predict associations between oligopeptides and infectious diseases remain scarce. This study introduces a prompt-guided graph-based contrastive learning framework (PGCLODA) to uncover potential associations. A tripartite graph is constructed with oligopeptides, microbes, and diseases as nodes, incorporating both structural and semantic information. To preserve critical regions during contrastive learning, a prompt-guided graph augmentation strategy is employed to generate meaningful paired views. A dual encoder architecture, integrating Graph Convolutional Network (GCN) and Transformer, is used to jointly capture local and global features. The fused embeddings are subsequently input into a multilayer perceptron (MLP) classifier for final prediction. Experimental results on a benchmark dataset indicate that PGCLODA consistently outperforms state-of-the-art models in AUROC, AUPRC, and accuracy. Ablation and hyperparameter studies confirm the contribution of each module. Case studies further validate the generalization ability of PGCLODA and its potential to uncover novel, biologically relevant associations. These findings offer valuable insights for mechanism-driven discovery and oligopeptide-based drug development. The source code of PGCLODA is available online at https://github.com/jjnlcode/PGCLODA.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.2029

Country:

North America (0.28)
Asia > China > Anhui Province (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.69)

Add feedback

CANDLE: A Cross-Modal Agentic Knowledge Distillation Framework for Interpretable Sarcopenia Diagnosis

Jin, Yuqi, Shuai, Zhenhao, Hu, Zihan, Zhang, Weiteng, Xie, Weihao, Shuai, Jianwei, Shen, Xian, Feng, Zhen

arXiv.org Artificial IntelligenceSep-25-2025

Background and Aims: Large language models (LLMs) have shown remarkable generalization and transfer capabilities by learning from vast corpora of text and web data. Their semantic representations allow cross-task knowledge transfer and reasoning, offering promising opportunities for data-scarce and heterogeneous domains such as clinical medicine. Yet, in diagnostic tasks like sarcopenia, major challenges remain: interpretability, transparency, and deployment efficiency. Traditional machine learning (TML) models provide stable performance and feature-level attribution, ensuring traceable and auditable decision logic, but lack semantic breadth. Conversely, LLMs enable flexible inference but often function as opaque predictors. Existing integration strategies remain shallow, rarely embedding the structured reasoning of TML into LLM inference. Methods: Using sarcopenia diagnosis as a case study, SHapley Additive exPlanations (SHAP) were extracted from a baseline XGBoost model and transformed into structured, LLM-compatible representations. An actor-critic reinforcement learning (RL) strategy guided the LLM to reason over these SHAP-based inputs, producing calibrated rationales and refined decision rules. The distilled reasoning was consolidated into a structured knowledge repository and deployed via retrieval-augmented generation (RAG) for case-based inference. Results: (Omitted here.) Conclusion: By coupling SHAP-derived statistical evidence with reinforcement-trained LLM reasoning, CANDLE mitigates the interpretability-performance trade-off, enhances predictive accuracy, and preserves high decision consistency. The framework offers a scalable approach to knowledge assetization of TML models, enabling interpretable, reproducible, and clinically aligned decision support in sarcopenia and potentially broader medical domains.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.21179

Genre: Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.93)
Health & Medicine > Consumer Health (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

mdok of KInIT: Robustly Fine-tuned LLM for Binary and Multiclass AI-Generated Text Detection

Macko, Dominik

arXiv.org Artificial IntelligenceSep-25-2025

The large language models (LLMs) are able to generate high-quality texts in multiple languages. Such texts are often not recognizable by humans as generated, and therefore present a potential of LLMs for misuse (e.g., plagiarism, spams, disinformation spreading). An automated detection is able to assist humans to indicate the machine-generated texts; however, its robustness to out-of-distribution data is still challenging. This notebook describes our mdok approach in robust detection, based on fine-tuning smaller LLMs for text classification. It is applied to both subtasks of Voight-Kampff Generative AI Detection 2025, providing remarkable performance (1st rank) in both, the binary detection as well as the multiclass classification of various cases of human-AI collaboration.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.01702

Country:

Europe (1.00)
Asia (0.94)
North America > United States (0.46)
North America > Mexico (0.28)

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Filters

Collaborating Authors

Performance Analysis

RedHerring Attack: Testing the Reliability of Attack Detection

Philosophy-informed Machine Learning

2567c95fd41459a98a73ba893775d22a-Paper-Conference.pdf

21b5883bc8fec922fdbbb06675388164-Paper-Conference.pdf

Statistical Inference Leveraging Synthetic Data with Distribution-Free Guarantees

MAGIC: Multi-task Gaussian process for joint imputation and classification in healthcare time series

Hybrid Pipeline SWD Detection in Long-Term EEG Signals

PGCLODA: Prompt-Guided Graph Contrastive Learning for Oligopeptide-Infectious Disease Association Prediction

CANDLE: A Cross-Modal Agentic Knowledge Distillation Framework for Interpretable Sarcopenia Diagnosis

mdok of KInIT: Robustly Fine-tuned LLM for Binary and Multiclass AI-Generated Text Detection