Goto

Collaborating Authors

 Performance Analysis


DRTA: Dynamic Reward Scaling for Reinforcement Learning in Time Series Anomaly Detection

arXiv.org Artificial Intelligence

Anomaly detection in time series data is important for applications in finance, healthcare, sensor networks, and industrial monitoring. Traditional methods usually struggle with limited labeled data, high false-positive rates, and difficulty generalizing to novel anomaly types. To overcome these challenges, we propose a reinforcement learning-based framework that integrates dynamic reward shaping, Variational Autoencoder (VAE), and active learning, called DRTA. Our method uses an adaptive reward mechanism that balances exploration and exploitation by dynamically scaling the effect of VAE-based reconstruction error and classification rewards. This approach enables the agent to detect anomalies effectively in low-label systems while maintaining high precision and recall. Our experimental results on the Yahoo A1 and Yahoo A2 benchmark datasets demonstrate that the proposed method consistently outperforms state-of-the-art unsupervised and semi-supervised approaches. These findings show that our framework is a scalable and efficient solution for real-world anomaly detection tasks.


Evaluating Logit-Based GOP Scores for Mispronunciation Detection

arXiv.org Artificial Intelligence

Pronunciation assessment relies on goodness of pronunciation (GOP) scores, traditionally derived from softmax-based posterior probabilities. However, posterior probabilities may suffer from overconfidence and poor phoneme separation, limiting their effectiveness. This study compares logit-based GOP scores with probability-based GOP scores for mispronunciation detection. We conducted our experiment on two L2 English speech datasets spoken by Dutch and Mandarin speakers, assessing classification performance and correlation with human ratings. Logit-based methods outperform probability-based GOP in classification, but their effectiveness depends on dataset characteristics. The maximum logit GOP shows the strongest alignment with human perception, while a combination of different GOP scores balances probability and logit features. The findings suggest that hybrid GOP methods incorporating uncertainty modeling and phoneme-specific weighting improve pronunciation assessment.


Enhancing GOP in CTC-Based Mispronunciation Detection with Phonological Knowledge

arXiv.org Artificial Intelligence

Computer-Assisted Pronunciation Training (CAPT) systems employ automatic measures of pronunciation quality, such as the goodness of pronunciation (GOP) metric. GOP relies on forced alignments, which are prone to labeling and segmentation errors due to acoustic variability. While alignment-free methods address these challenges, they are computationally expensive and scale poorly with phoneme sequence length and inventory size. To enhance efficiency, we introduce a substitution-aware alignment-free GOP that restricts phoneme substitutions based on phoneme clusters and common learner errors. We evaluated our GOP on two L2 English speech datasets, one with child speech, My Pronunciation Coach (MPC), and Spee-chOcean762, which includes child and adult speech. We compared RPS (restricted phoneme substitutions) and UPS (unrestricted phoneme substitutions) setups within alignment-free methods, which outperformed the baseline. We discuss our results and outline avenues for future research.


Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data

arXiv.org Artificial Intelligence

Speech impairments are prevalent biomarkers for Parkinson's Disease (PD), motivating the development of diagnostic techniques using speech data for clinical applications. Although deep acoustic features have shown promise for PD classification, their effectiveness often varies due to individual speaker differences, a factor that has not been thoroughly explored in the existing literature. This study investigates the effectiveness of three pre-trained audio embeddings (OpenL3, VGGish and Wav2Vec2.0 models) for PD classification. Using the NeuroVoz dataset, OpenL3 outperforms others in diadochokinesis (DDK) and listen and repeat (LR) tasks, capturing critical acoustic features for PD detection. Only Wav2Vec2.0 shows significant gender bias, achieving more favorable results for male speakers, in DDK tasks. The misclassified cases reveal challenges with atypical speech patterns, highlighting the need for improved feature extraction and model robustness in PD detection.


OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization

arXiv.org Artificial Intelligence

However, current diffusion watermarking methods face significant limitations: zero-bit watermarking systems lack the capacity for large-scale user tracking, while multi-bit methods are highly sensitive to certain image transformations or generative attacks, resulting in a lack of comprehensive robustness. In this paper, we propose OptMark, an optimization-based approach that embeds a robust multi-bit watermark into the intermediate latents of the diffusion denoising process. OptMark strategically inserts a structural watermark early to resist generative attacks and a detail watermark late to withstand image transformations, with tailored regularization terms to preserve image quality and ensure imperceptibility. To address the challenge of memory consumption growing linearly with the number of denoising steps during optimization, OptMark incorporates adjoint gradient methods, reducing memory usage from O ( N) to O (1). Experimental results demonstrate that Opt-Mark achieves invisible multi-bit watermarking while ensuring robust resilience against valuemetric transformations, geometric transformations, editing, and regeneration attacks.


Entropy-Based Non-Invasive Reliability Monitoring of Convolutional Neural Networks

arXiv.org Artificial Intelligence

Convolutional Neural Networks (CNNs) have become the foundation of modern computer vision, achieving unprecedented accuracy across diverse image recognition tasks. While these networks excel on in-distribution data, they remain vulnerable to adversarial perturbations imperceptible input modifications that cause misclassification with high confidence. However, existing detection methods either require expensive retraining, modify network architecture, or degrade performance on clean inputs. Here we show that adversarial perturbations create immediate, detectable entropy signatures in CNN activations that can be monitored without any model modification. Using parallel entropy monitoring on VGG-16, we demonstrate that adversarial inputs consistently shift activation entropy by 7% in early convolutional layers, enabling 90% detection accuracy with false positives and false negative rates below 20%. The complete separation between clean and adversarial entropy distributions reveals that CNNs inherently encode distribution shifts in their activation patterns. This work establishes that CNN reliability can be assessed through activation entropy alone, enabling practical deployment of self-diagnostic vision systems that detect adversarial inputs in real-time without compromising original model performance.


Activation Subspaces for Out-of-Distribution Detection

arXiv.org Artificial Intelligence

T o ensure the reliability of deep models in real-world applications, out-of-distribution (OOD) detection methods aim to distinguish samples close to the training distribution (in-distribution, ID) from those farther away (OOD). In this work, we propose a novel OOD detection method that utilizes singular value decomposition of the weight matrix of the classification head to decompose the model's activations into decisive and insignificant components, which contribute maximally, respectively minimally, to the final classifier output. W e find that the subspace of insignificant components more effectively distinguishes ID from OOD data than raw activations in regimes of large distribution shifts (Far-OOD). This occurs because the classification objective leaves the insignificant subspace largely unaffected, yielding features that are "untainted" by the target classification task. Conversely, in regimes of smaller distribution shifts (Near-OOD), we find that activation shaping methods profit from only considering the decisive subspace, as the insignificant component can cause interference in the activation space. By combining two findings into a single approach, termed ActSub, we achieve state-of-the-art results in various standard OOD benchmarks.


Accept or Deny? Evaluating LLM Fairness and Performance in Loan Approval across Table-to-Text Serialization Approaches

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly employed in high-stakes decision-making tasks, such as loan approvals. While their applications expand across domains, LLMs struggle to process tabular data, ensuring fairness and delivering reliable predictions. In this work, we assess the performance and fairness of LLMs on serialized loan approval datasets from three geographically distinct regions: Ghana, Germany, and the United States. Our evaluation focuses on the model's zero-shot and in-context learning (ICL) capabilities. Our results reveal that the choice of serialization (Serialization refers to the process of converting tabular data into text formats suitable for processing by LLMs.) format significantly affects both performance and fairness in LLMs, with certain formats such as GReat and LIFT yielding higher F1 scores but exacerbating fairness disparities. Notably, while ICL improved model performance by 4.9-59.6% relative to zero-shot baselines, its effect on fairness varied considerably across datasets. Our work underscores the importance of effective tabular data representation methods and fairness-aware models to improve the reliability of LLMs in financial decision-making.


Detecting Domain Shifts in Myoelectric Activations: Challenges and Opportunities in Stream Learning

arXiv.org Artificial Intelligence

Detecting domain shifts in myoelectric activations poses a significant challenge due to the inherent non-stationarity of electromyography (EMG) signals. This paper explores the detection of domain shifts using data stream (DS) learning techniques, focusing on the DB6 dataset from the Ninapro database. We define domains as distinct time-series segments based on different subjects and recording sessions, applying Kernel Principal Component Analysis (KPCA) with a cosine kernel to pre-process and highlight these shifts. By evaluating multiple drift detection methods such as CUSUM, Page-Hinckley, and ADWIN, we reveal the limitations of current techniques in achieving high performance for real-time domain shift detection in EMG signals. Our results underscore the potential of streaming-based approaches for maintaining stable EMG decoding models, while highlighting areas for further research to enhance robustness and accuracy in real-world scenarios.


CALM: A Framework for Continuous, Adaptive, and LLM-Mediated Anomaly Detection in Time-Series Streams

arXiv.org Artificial Intelligence

The detection of anomalies in non-stationary time-series streams is a critical but challenging task across numerous industrial and scientific domains. Traditional models, trained offline, suffer significant performance degradation when faced with concept drift, where the underlying statistical properties of the data change over time. This paper introduces CALM (Continuous, Adaptive, and LLM-Mediated), a novel, end-to-end framework for real-time anomaly detection designed to address this challenge. CALM is built on the Apache Beam distributed processing framework and leverages the TimesFm foundation model for forecasting-based anomaly detection. The framework's novelty lies in two core contributions. First, it implements a closed-loop, continuous fine-tuning mechanism that allows the anomaly detection model to adapt to evolving data patterns in near real-time. Second, it introduces an LLM-as-a-Judge component, a Large Language Model that provides semantic, context-aware judgments on detected anomalies to curate a high-quality training dataset, deciding whether an anomaly represents transient noise or a meaningful pattern shift. We evaluate CALM on the comprehensive TSB-UAD benchmark. Our results demonstrate that the continuously fine-tuned model improves the ROC AUC score in most datasets compared to the static, pre-trained base model, validating the efficacy of our adaptive, LLM-guided approach to maintaining high-performance anomaly detection in dynamic streaming environments.