Performance Analysis
From Data to Decision: A Multi-Stage Framework for Class Imbalance Mitigation in Optical Network Failure Analysis
Ali, Yousuf Moiz, Prilepsky, Jaroslaw E., Sambo, Nicola, Pedro, Joao, Hosseini, Mohammad M., Napoli, Antonio, Turitsyn, Sergei K., Freire, Pedro
Machine learning-based failure management in optical networks has gained significant attention in recent years. However, severe class imbalance, where normal instances vastly outnumber failure cases, remains a considerable challenge. While pre- and in-processing techniques have been widely studied, post-processing methods are largely unexplored. In this work, we present a direct comparison of pre-, in-, and post-processing approaches for class imbalance mitigation in failure detection and identification using an experimental dataset. For failure detection, post-processing methods-particularly Threshold Adjustment-achieve the highest F1 score improvement (up to 15.3%), while Random Under-Sampling provides the fastest inference. In failure identification, GenAI methods deliver the most substantial performance gains (up to 24.2%), whereas post-processing shows limited impact in multi-class settings. When class overlap is present and latency is critical, over-sampling methods such as the SMOTE are most effective; without latency constraints, Meta-Learning yields the best results. In low-overlap scenarios, Generative AI approaches provide the highest performance with minimal inference time.
ARTPS: Depth-Enhanced Hybrid Anomaly Detection and Learnable Curiosity Score for Autonomous Rover Target Prioritization
We present ARTPS (Autonomous Rover Target Prioritization System), a novel hybrid AI system that combines depth estimation, anomaly detection, and learnable curiosity scoring for autonomous exploration of planetary surfaces. Our approach integrates monocular depth estimation using Vision Transformers with multi-component anomaly detection and a weighted curiosity score that balances known value, anomaly signals, depth variance, and surface roughness. The system achieves state-of-the-art performance with AUROC of 0.94, AUPRC of 0.89, and F1-Score of 0.87 on Mars rover datasets. We demonstrate significant improvements in target prioritization accuracy through ablation studies and provide comprehensive analysis of component contributions. The hybrid fusion approach reduces false positives by 23% while maintaining high detection sensitivity across diverse terrain types.
Exploring the Efficacy of Convolutional Neural Networks in Sleep Apnea Detection from Single Channel EEG
Sleep apnea, a prevalent sleep disorder, involves repeated episodes of breathing interruptions during sleep, leading to various health complications, including cognitive impairments, high blood pressure, heart disease, stroke, and even death. One of the main challenges in diagnosing and treating sleep apnea is identifying individuals at risk. The current gold standard for diagnosis, Polysomnography (PSG), is costly, labor intensive, and inconvenient, often resulting in poor quality sleep data. This paper presents a novel approach to the detection of sleep apnea using a Convolutional Neural Network (CNN) trained on single channel EEG data. The proposed CNN achieved an accuracy of 85.1% and a Matthews Correlation Coefficient (MCC) of 0.22, demonstrating a significant potential for home based applications by addressing the limitations of PSG in automated sleep apnea detection. Key contributions of this work also include the development of a comprehensive preprocessing pipeline with an Infinite Impulse Response (IIR) Butterworth filter, a dataset construction method providing broader temporal context, and the application of SMOTETomek to address class imbalance. This research underscores the feasibility of transitioning from traditional laboratory based diagnostics to more accessible, automated home based solutions, improving patient outcomes and broadening the accessibility of sleep disorder diagnostics.
VistaWise: Building Cost-Effective Agent with Cross-Modal Knowledge Graph for Minecraft
Fu, Honghao, Ren, Junlong, Chai, Qi, Ye, Deheng, Cai, Yujun, Wang, Hao
Large language models (LLMs) have shown significant promise in embodied decision-making tasks within virtual open-world environments. Nonetheless, their performance is hindered by the absence of domain-specific knowledge. Methods that finetune on large-scale domain-specific data entail prohibitive development costs. This paper introduces VistaWise, a cost-effective agent framework that integrates cross-modal domain knowledge and finetunes a dedicated object detection model for visual analysis. It reduces the requirement for domain-specific training data from millions of samples to a few hundred. VistaWise integrates visual information and textual dependencies into a cross-modal knowledge graph (KG), enabling a comprehensive and accurate understanding of multimodal environments. We also equip the agent with a retrieval-based pooling strategy to extract task-related information from the KG, and a desktop-level skill library to support direct operation of the Minecraft desktop client via mouse and keyboard inputs. Experimental results demonstrate that VistaWise achieves state-of-the-art performance across various open-world tasks, highlighting its effectiveness in reducing development costs while enhancing agent performance.
Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation
Liu, Changsong, Peng, Yizhou, Chng, Eng Siong
Contextual automatic speech recognition (ASR) systems allow for recognizing out-of-vocabulary (OOV) words, such as named entities or rare words. However, it remains challenging due to limited training data and ambiguous or inconsistent pronunciations. In this paper, we propose a synthesis-driven multi-pronunciation contextual biasing method that performs zero-shot contextual ASR on a pretrained Whisper model. Specifically, we leverage text-to-speech (TTS) systems to synthesize diverse speech samples containing each target rare word, and then use the pretrained Whisper model to extract multiple predicted pronunciation variants. These variant token sequences are compiled into a prefix-trie, which assigns rewards to beam hypotheses in a shallow-fusion manner during beam-search decoding. Subsequently, any recognized variant is mapped back to the original rare word in the final transcription. The evaluation results on the LibriSpeech dataset show that our method reduces biased-word error rate (B-WER) by 43% on test-clean and 44% on test-other while maintaining unbiased-WER (U-WER) essentially unchanged.
Segmentation and Classification of Pap Smear Images for Cervical Cancer Detection Using Deep Learning
Albzour, Nisreen, Lam, Sarah S.
Cervical cancer remains a significant global health concern and a leading cause of cancer-related deaths among women. Early detection through Pap smear tests is essential to reduce mortality rates; however, the manual examination is time consuming and prone to human error. This study proposes a deep learning framework that integrates U-Net for segmentation and a classification model to enhance diagnostic performance. The Herlev Pap Smear Dataset, a publicly available cervical cell dataset, was utilized for training and evaluation. The impact of segmentation on classification performance was evaluated by comparing the model trained on segmented images and another trained on non-segmented images. Experimental results showed that the use of segmented images marginally improved the model performance on precision (about 0.41 percent higher) and F1-score (about 1.30 percent higher), which suggests a slightly more balanced classification performance. While segmentation helps in feature extraction, the results showed that its impact on classification performance appears to be limited. The proposed framework offers a supplemental tool for clinical applications, which may aid pathologists in early diagnosis.
Text Meets Topology: Rethinking Out-of-distribution Detection in Text-Rich Networks
Wang, Danny, Qiu, Ruihong, Bai, Guangdong, Huang, Zi
Out-of-distribution (OOD) detection remains challenging in text-rich networks, where textual features intertwine with topological structures. Existing methods primarily address label shifts or rudimentary domain-based splits, overlooking the intricate textual-structural diversity. For example, in social networks, where users represent nodes with textual features (name, bio) while edges indicate friendship status, OOD may stem from the distinct language patterns between bot and normal users. To address this gap, we introduce the TextTopoOOD framework for evaluating detection across diverse OOD scenarios: (1) attribute-level shifts via text augmentations and embedding perturbations; (2) structural shifts through edge rewiring and semantic connections; (3) thematically-guided label shifts; and (4) domain-based divisions. Furthermore, we propose TNT-OOD to model the complex interplay between Text aNd Topology using: 1) a novel cross-attention module to fuse local structure into node-level text representations, and 2) a HyperNetwork to generate node-specific transformation parameters. This aligns topological and semantic features of ID nodes, enhancing ID/OOD distinction across structural and textual shifts. Experiments on 11 datasets across four OOD scenarios demonstrate the nuanced challenge of TextTopoOOD for evaluating OOD detection in text-rich networks.
Learning local and global prototypes with optimal transport for unsupervised anomaly detection and localization
Trombetta, Robin, Lartizien, Carole
Unsupervised anomaly detection aims to detect defective parts of a sample by having access, during training, to a set of normal, i.e. defect-free, data. It has many applications in fields, such as industrial inspection or medical imaging, where acquiring labels is costly or when we want to avoid introducing biases in the type of anomalies that can be spotted. In this work, we propose a novel UAD method based on prototype learning and introduce a metric to compare a structured set of embeddings that balances a feature-based cost and a spatial-based cost. We leverage this metric to learn local and global prototypes with optimal transport from latent representations extracted with a pre-trained image encoder. We demonstrate that our approach can enforce a structural constraint when learning the prototypes, allowing to capture the underlying organization of the normal samples, thus improving the detection of incoherencies in images. Our model achieves performance that is on par with strong baselines on two reference benchmarks for anomaly detection on industrial images.
Machine Intelligence on the Edge: Interpretable Cardiac Pattern Localisation Using Reinforcement Learning
Tian, Haozhe, Rao, Qiyu, Moutonnet, Nina, Ferraro, Pietro, Mandic, Danilo
Matched filters are widely used to localise signal patterns due to their high efficiency and interpretability. However, their effectiveness deteriorates for low signal-to-noise ratio (SNR) signals, such as those recorded on edge devices, where prominent noise patterns can closely resemble the target within the limited length of the filter. One example is the ear-electrocardiogram (ear-ECG), where the cardiac signal is attenuated and heavily corrupted by artefacts. To address this, we propose the Sequential Matched Filter (SMF), a paradigm that replaces the conventional single matched filter with a sequence of filters designed by a Reinforcement Learning agent. By formulating filter design as a sequential decision-making process, SMF adaptively design signal-specific filter sequences that remain fully interpretable by revealing key patterns driving the decision-making. The proposed SMF framework has strong potential for reliable and interpretable clinical decision support, as demonstrated by its state-of-the-art R-peak detection and physiological state classification performance on two challenging real-world ECG datasets. The proposed formulation can also be extended to a broad range of applications that require accurate pattern localisation from noise-corrupted signals.
Deep Active Learning for Lung Disease Severity Classification from Chest X-rays: Learning with Less Data in the Presence of Class Imbalance
Gabriel, Roy M., Zandehshahvar, Mohammadreza, van Assen, Marly, Kittisut, Nattakorn, Peters, Kyle, De Cecco, Carlo N., Adibi, Ali
To reduce the amount of required labeled data for lung disease severity classification from chest X-rays (CXRs) under class imbalance, this study applied deep active learning with a Bayesian Neural Network (BNN) approximation and weighted loss function. This retrospective study collected 2,319 CXRs from 963 patients (mean age, 59.2 $\pm$ 16.6 years; 481 female) at Emory Healthcare affiliated hospitals between January and November 2020. All patients had clinically confirmed COVID-19. Each CXR was independently labeled by 3 to 6 board-certified radiologists as normal, moderate, or severe. A deep neural network with Monte Carlo Dropout was trained using active learning to classify disease severity. Various acquisition functions were used to iteratively select the most informative samples from an unlabeled pool. Performance was evaluated using accuracy, area under the receiver operating characteristic curve (AU ROC), and area under the precision-recall curve (AU PRC). Training time and acquisition time were recorded. Statistical analysis included descriptive metrics and performance comparisons across acquisition strategies. Entropy Sampling achieved 93.7% accuracy (AU ROC, 0.91) in binary classification (normal vs. diseased) using 15.4% of the training data. In the multi-class setting, Mean STD sampling achieved 70.3% accuracy (AU ROC, 0.86) using 23.1% of the labeled data. These methods outperformed more complex and computationally expensive acquisition functions and significantly reduced labeling needs. Deep active learning with BNN approximation and weighted loss effectively reduces labeled data requirements while addressing class imbalance, maintaining or exceeding diagnostic performance.