AITopics

People living with dementia (PLwD) often show gradual shifts in how they communicate, becoming less expressive, more repetitive, or drifting off-topic in subtle ways. While caregivers may notice these changes informally, most computational tools are not designed to track such behavioral drift over time. This paper introduces PersonaDrift, a synthetic benchmark designed to evaluate machine learning and statistical methods for detecting progressive changes in daily communication, focusing on user responses to a digital reminder system. PersonaDrift simulates 60-day interaction logs for synthetic users modeled after real PLwD, based on interviews with caregivers. These caregiver-informed personas vary in tone, modality, and communication habits, enabling realistic diversity in behavior. The benchmark focuses on two forms of longitudinal change that caregivers highlighted as particularly salient: flattened sentiment (reduced emotional tone and verbosity) and off-topic replies (semantic drift). These changes are injected progressively at different rates to emulate naturalistic cognitive trajectories, and the framework is designed to be extensible to additional behaviors in future use cases. To explore this novel application space, we evaluate several anomaly detection approaches, unsupervised statistical methods (CUSUM, EWMA, One-Class SVM), sequence models using contextual embeddings (GRU + BERT), and supervised classifiers in both generalized and personalized settings. Preliminary results show that flattened sentiment can often be detected with simple statistical models in users with low baseline variability, while detecting semantic drift requires temporal modeling and personalized baselines. Across both tasks, personalized classifiers consistently outperform generalized ones, highlighting the importance of individual behavioral context.

data mining, machine learning, persona, (18 more...)

2511.16445

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Dementia (0.62)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Q-MLLM: Vector Quantization for Robust Multimodal Large Language Model Security

Zhao, Wei, Li, Zhe, Li, Yige, Sun, Jun

Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities in cross-modal understanding, but remain vulnerable to adversarial attacks through visual inputs despite robust textual safety mechanisms. These vulnerabilities arise from two core weaknesses: the continuous nature of visual representations, which allows for gradient-based attacks, and the inadequate transfer of text-based safety mechanisms to visual content. We introduce Q-MLLM, a novel architecture that integrates two-level vector quantization to create a discrete bottleneck against adversarial attacks while preserving multimodal reasoning capabilities. By discretizing visual representations at both pixel-patch and semantic levels, Q-MLLM blocks attack pathways and bridges the cross-modal safety alignment gap. Our two-stage training methodology ensures robust learning while maintaining model utility. Experiments demonstrate that Q-MLLM achieves significantly better defense success rate against both jailbreak attacks and toxic image attacks than existing approaches. Notably, Q-MLLM achieves perfect defense success rate (100\%) against jailbreak attacks except in one arguable case, while maintaining competitive performance on multiple utility benchmarks with minimal inference overhead. This work establishes vector quantization as an effective defense mechanism for secure multimodal AI systems without requiring expensive safety-specific fine-tuning or detection overhead. Code is available at https://github.com/Amadeuszhao/QMLLM.

large language model, machine learning, natural language, (19 more...)

2511.16229

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Zanbaghi, Shahin, Rostampour, Ryan, Abid, Farhan, Jarmakani, Salim Al

Detecting Sleeper Agents in Large Language Models via Semantic Drift Analysis

Large Language Models (LLMs) can be backdoored to exhibit malicious behavior under specific deployment conditions while appearing safe during training a phenomenon known as "sleeper agents." Recent work by Hubinger et al. demonstrated that these backdoors persist through safety training, yet no practical detection methods exist. We present a novel dual-method detection system combining semantic drift analysis with canary baseline comparison to identify backdoored LLMs in real-time. Our approach uses Sentence-BERT embeddings to measure semantic deviation from safe baselines, complemented by injected canary questions that monitor response consistency. Evaluated on the official Cadenza-Labs dolphin-llama3-8B sleeper agent model, our system achieves 92.5% accuracy with 100% precision (zero false positives) and 85% recall. The combined detection method operates in real-time (<1s per query), requires no model modification, and provides the first practical solution to LLM backdoor detection. Our work addresses a critical security gap in AI deployment and demonstrates that embedding-based detection can effectively identify deceptive model behavior without sacrificing deployment efficiency.

backdoor, large language model, machine learning, (17 more...)

2511.15992

Country: North America > Canada > Ontario (0.14)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Zhang, Jingru, Moradi, Saed, Saha, Ashirbani

Externally Validated Multi-Task Learning via Consistency Regularization Using Differentiable BI-RADS Features for Breast Ultrasound Tumor Segmentation

Multi-task learning can suffer from destructive task interference, where jointly trained models underperform single-task baselines and limit generalization. To improve generalization performance in breast ultrasound-based tumor segmentation via multi-task learning, we propose a novel consistency regularization approach that mitigates destructive interference between segmentation and classification. The consistency regularization approach is composed of differentiable BI-RADS-inspired morphological features. We validated this approach by training all models on the BrEaST dataset (Poland) and evaluating them on three external datasets: UDIAT (Spain), BUSI (Egypt), and BUS-UCLM (Spain). Our comprehensive analysis demonstrates statistically significant (p<0.001) improvements in generalization for segmentation task of the proposed multi-task approach vs. the baseline one: UDIAT, BUSI, BUS-UCLM (Dice coefficient=0.81 vs 0.59, 0.66 vs 0.56, 0.69 vs 0.49, resp.). The proposed approach also achieves state-of-the-art segmentation performance under rigorous external validation on the UDIAT dataset.

artificial intelligence, classification, machine learning, (15 more...)

2511.15968

Country:

Europe > Spain (0.45)
Africa > Middle East > Egypt (0.24)
North America > Canada > Ontario > Hamilton (0.15)

Genre: Research Report > Experimental Study (0.35)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.95)
Health & Medicine > Therapeutic Area (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)

Ramakrishnan, Badrinath, Balaji, Akshaya

Securing AI Agents Against Prompt Injection Attacks

Retrieval-augmented generation (RAG) systems have become widely used for enhancing large language model capabilities, but they introduce significant security vulnerabilities through prompt injection attacks. We present a comprehensive benchmark for evaluating prompt injection risks in RAG-enabled AI agents and propose a multi-layered defense framework. Our benchmark includes 847 adversarial test cases across five attack categories: direct injection, context manipulation, instruction override, data exfiltration, and cross-context contamination. We evaluate three defense mechanisms: content filtering with embedding-based anomaly detection, hierarchical system prompt guardrails, and multi-stage response verification, across seven state-of-the-art language models. Our combined framework reduces successful attack rates from 73.2% to 8.7% while maintaining 94.3% of baseline task performance. We release our benchmark dataset and defense implementation to support future research in AI agent security.

large language model, machine learning, natural language, (17 more...)

2511.15759

Genre: Research Report (0.83)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)
(2 more...)

Kamen, Ariel, Kamen, Yakov

Majority Rules: LLM Ensemble is a Winning Approach for Content Categorization

This study introduces an ensemble framework for unstructured text categorization using large language models (LLMs). By integrating multiple models, the ensemble large language model (eLLM) framework addresses common weaknesses of individual systems, including inconsistency, hallucination, category inflation, and misclassification. The eLLM approach yields a substantial performance improvement of up to 65\% in F1-score over the strongest single model. We formalize the ensemble process through a mathematical model of collective decision-making and establish principled aggregation criteria. Using the Interactive Advertising Bureau (IAB) hierarchical taxonomy, we evaluate ten state-of-the-art LLMs under identical zero-shot conditions on a human-annotated corpus of 8{,}660 samples. Results show that individual models plateau in performance due to the compression of semantically rich text into sparse categorical representations, while eLLM improves both robustness and accuracy. With a diverse consortium of models, eLLM achieves near human-expert-level performance, offering a scalable and reliable solution for taxonomy-based classification that may significantly reduce dependence on human expert labeling.

large language model, machine learning, natural language, (20 more...)

2511.15714

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Formal Models and Convergence Analysis for Context-Aware Security Verification

Chaudhary, Ayush

Traditional security scanners fail when facing new attack patterns they haven't seen before. They rely on fixed rules and predetermined signatures, making them blind to novel threats. We present a fundamentally different approach: instead of memorizing specific attack patterns, we learn what makes systems genuinely secure. Our key insight is simple yet powerful: context determines vulnerability. A SQL query that's safe in one environment becomes dangerous in another. By modeling this context-vulnerability relationship, we achieve something remarkable: our system detects attacks it has never seen before. We introduce context-aware verification that learns from genuine system behavior. Through reconstruction learning on secure systems, we capture their essential characteristics. When an unknown attack deviates from these patterns, our system recognizes it, even without prior knowledge of that specific attack type. We prove this capability theoretically, showing detection rates improve exponentially with context information I(W;C). Our framework combines three components: (1) reconstruction learning that models secure behavior, (2) multi-scale graph reasoning that aggregates contextual clues, and (3) attention mechanisms guided by reconstruction differences. Extensive experiments validate our approach: detection accuracy jumps from 58 percent to 82 percent with full context, unknown attack detection improves by 31 percent, and our system maintains above 90 percent accuracy even against completely novel attack vectors.

large language model, machine learning, natural language, (19 more...)

2510.1244

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)
(3 more...)

Timmer, Roelien C, Hou, Yufang, Wan, Stephen

A Position Paper on the Automatic Generation of Machine Learning Leaderboards

An important task in machine learning (ML) research is comparing prior work, which is often performed via ML leaderboards: a tabular overview of experiments with comparable conditions (e.g., same task, dataset, and metric). However, the growing volume of literature creates challenges in creating and maintaining these leaderboards. To ease this burden, researchers have developed methods to extract leaderboard entries from research papers for automated leaderboard curation. Yet, prior work varies in problem framing, complicating comparisons and limiting real-world applicability. In this position paper, we present the first overview of Automatic Leaderboard Generation (ALG) research, identifying fundamental differences in assumptions, scope, and output formats. We propose an ALG unified conceptual framework to standardise how the ALG task is defined. We offer ALG benchmarking guidelines, including recommendations for datasets and metrics that promote fair, reproducible evaluation. Lastly, we outline challenges and new directions for ALG, such as, advocating for broader coverage by including all reported results and richer metadata.

large language model, machine learning, natural language, (22 more...)

doi: 10.18653/v1/2025.emnlp-main.1566

2505.17465

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Ho, Sy-Tuyen, Hao, Koh Jun, Nguyen, Ngoc-Bao, Binder, Alexander, Cheung, Ngai-Man

Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment

Model Inversion (MI) attacks aim to reconstruct information from private training data by exploiting access to machine learning models T. To evaluate such attacks, the standard evaluation framework relies on an evaluation model E, trained under the same task design as T. This framework has become the de facto standard for assessing progress in MI research, used across nearly all recent MI studies without question. In this paper, we present the first in-depth study of this evaluation framework. In particular, we identify a critical issue of this standard framework: Type-I adversarial examples. These are reconstructions that do not capture the visual features of private training data, yet are still deemed successful by T and ultimately transferable to E. Such false positives undermine the reliability of the standard MI evaluation framework. To address this issue, we introduce a new MI evaluation framework that replaces the evaluation model E with advanced Multimodal Large Language Models (MLLMs). By leveraging their general-purpose visual understanding, our MLLM-based framework does not depend on training of shared task design as in T, thus reducing Type-I transferability and providing more faithful assessments of reconstruction success. Using our MLLM-based evaluation framework, we reevaluate 27 diverse MI attack setups and empirically reveal consistently high false positive rates under the standard evaluation framework. Importantly, we demonstrate that many state-of-the-art (SOTA) MI methods report inflated attack accuracy, indicating that actual privacy leakage is significantly lower than previously believed. By uncovering this critical issue and proposing a robust solution, our work enables a reassessment of progress in MI research and sets a new standard for reliable and robust evaluation. Code can be found in https://github.com/hosytuyen/MI-Eval-MLLM

large language model, machine learning, natural language, (20 more...)

2505.03519

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Ding, Jingqiu, Hua, Yiding, Lindberg, Kasper, Steurer, David, Storozhenko, Aleksandr

Rate-optimal community detection near the KS threshold via node-robust algorithms

arXiv.org Machine LearningNov-21-2025

We study community detection in the \emph{symmetric $k$-stochastic block model}, where $n$ nodes are evenly partitioned into $k$ clusters with intra- and inter-cluster connection probabilities $p$ and $q$, respectively. Our main result is a polynomial-time algorithm that achieves the minimax-optimal misclassification rate \begin{equation*} \exp \Bigl(-\bigl(1 \pm o(1)\bigr) \tfrac{C}{k}\Bigr), \quad \text{where } C = (\sqrt{pn} - \sqrt{qn})^2, \end{equation*} whenever $C \ge K\,k^2\,\log k$ for some universal constant $K$, matching the Kesten--Stigum (KS) threshold up to a $\log k$ factor. Notably, this rate holds even when an adversary corrupts an $η\le \exp\bigl(- (1 \pm o(1)) \tfrac{C}{k}\bigr)$ fraction of the nodes. To the best of our knowledge, the minimax rate was previously only attainable either via computationally inefficient procedures [ZZ15] or via polynomial-time algorithms that require strictly stronger assumptions such as $C \ge K k^3$ [GMZZ17]. In the node-robust setting, the best known algorithm requires the substantially stronger condition $C \ge K k^{102}$ [LM22]. Our results close this gap by providing the first polynomial-time algorithm that achieves the minimax rate near the KS threshold in both settings. Our work has two key technical contributions: (1) we robustify majority voting via the Sum-of-Squares framework, (2) we develop a novel graph bisection algorithm via robust majority voting, which allows us to significantly improve the misclassification rate to $1/\mathrm{poly}(k)$ for the initial estimation near the KS threshold.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2511.16613

Country: Europe (0.45)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.55)