Performance Analysis
AquaSignal: An Integrated Framework for Robust Underwater Acoustic Analysis
Panteli, Eirini, Santos, Paulo E., Humphrey, Nabil
This paper presents AquaSignal, a modular and scalable pipeline for preprocessing, denoising, classification, and novelty detection of underwater acoustic signals. Designed to operate effectively in noisy and dynamic marine environments, AquaSignal integrates state-of-the-art deep learning architectures to enhance the reliability and accuracy of acoustic signal analysis. The system is evaluated on a combined dataset from the Deepship and Ocean Networks Canada (ONC) benchmarks, providing a diverse set of real-world underwater scenarios. AquaSignal employs a U-Net architecture for denoising, a ResNet18 convolutional neural network for classifying known acoustic events, and an AutoEncoder-based model for unsupervised detection of novel or anomalous signals. To our knowledge, this is the first comprehensive study to apply and evaluate this combination of techniques on maritime vessel acoustic data. Experimental results show that AquaSignal improves signal clarity and task performance, achieving 71% classification accuracy and 91% accuracy in novelty detection. Despite slightly lower classification performance compared to some state-of-the-art models, differences in data partitioning strategies limit direct comparisons. Overall, AquaSignal demonstrates strong potential for real-time underwater acoustic monitoring in scientific, environmental, and maritime domains.
RAR: Setting Knowledge Tripwires for Retrieval Augmented Rejection
Buonocore, Tommaso Mario, Parimbelli, Enea
Content moderation for large language models (LLMs) is increasingly critical as LLMs are deployed in user-facing applications. Traditional moderation often relies on static classifiers or handcrafted prompt filters, which struggle to adapt quickly to new threats [1] . Recent analyses show that even retrieval-augmented generation (RAG) pipelines can inadvertently introduce safety risks, causing models to change their safety profile [2] . This paper presents retrieval-augmented rejection (RAR), a novel approach that repurposes the RAG architecture [3], typically used to enhance LLM knowledge, as a dynamic content moderation mechanism. By intentionally adding documents that mimic harmful content and questions (which we term "negative documents") to the vector database and flagging them accordingly, the system can leverage the retrieval mechanism to identify and reject malicious queries without requiring model retraining or architectural changes. The key contributions of this work include: i) a novel content moderation approach that requires no architectural changes to existing RAG systems; ii) a methodology for creating and maintaining "negative documents" for effective query filtering; iii) a flexible threshold-based rejection mechanism that can be dynamically adjusted; iv) preliminary evaluation against existing content moderation approaches. 1 1.1.
Open Set Domain Adaptation with Vision-language models via Gradient-aware Separation
Open-Set Domain Adaptation (OSDA) confronts the dual challenge of aligning known-class distributions across domains while identifying target-domain-specific unknown categories. Current approaches often fail to leverage semantic relationships between modalities and struggle with error accumulation in unknown sample detection. We propose to harness Contrastive Language-Image Pretraining (CLIP) to address these limitations through two key innovations: 1) Prompt-driven cross-domain alignment: Learnable textual prompts conditioned on domain discrepancy metrics dynamically adapt CLIP's text encoder, enabling semantic consistency between source and target domains without explicit unknown-class supervision. 2) Gradient-aware open-set separation: A gradient analysis module quantifies domain shift by comparing the L2-norm of gradients from the learned prompts, where known/unknown samples exhibit statistically distinct gradient behaviors. Evaluations on Office-Home show that our method consistently outperforms CLIP baseline and standard baseline. Ablation studies confirm the gradient norm's critical role.
Reachability Barrier Networks: Learning Hamilton-Jacobi Solutions for Smooth and Flexible Control Barrier Functions
Kim, Matthew, Sharpless, William, Jeong, Hyun Joe, Tonkens, Sander, Bansal, Somil, Herbert, Sylvia
Recent developments in autonomous driving and robotics underscore the necessity of safety-critical controllers. Control barrier functions (CBFs) are a popular method for appending safety guarantees to a general control framework, but they are notoriously difficult to generate beyond low dimensions. Existing methods often yield non-differentiable or inaccurate approximations that lack integrity, and thus fail to ensure safety. In this work, we use physics-informed neural networks (PINNs) to generate smooth approximations of CBFs by computing Hamilton-Jacobi (HJ) optimal control solutions. These reachability barrier networks (RBNs) avoid traditional dimensionality constraints and support the tuning of their conservativeness post-training through a parameterized discount term. To ensure robustness of the discounted solutions, we leverage conformal prediction methods to derive probabilistic safety guarantees for RBNs. We demonstrate that RBNs are highly accurate in low dimensions, and safer than the standard neural CBF approach in high dimensions. Namely, we showcase the RBNs in a 9D multi-vehicle collision avoidance problem where it empirically proves to be 5.5x safer and 1.9x less conservative than the neural CBFs, offering a promising method to synthesize CBFs for general nonlinear autonomous systems.
Early Diagnosis of Atrial Fibrillation Recurrence: A Large Tabular Model Approach with Structured and Unstructured Clinical Data
Domingo-Aldama, Ane G., Prado, Marcos Merino, Olea, Alain García, Galletebeitia, Koldo Gojenola, Salutregi, Josu Goikoetxea, Salazar, Aitziber Atutxa
BACKGROUND: Atrial fibrillation (AF), the most common arrhythmia, is linked to high morbidity and mortality. In a fast-evolving AF rhythm control treatment era, predicting AF recurrence after its onset may be crucial to achieve the optimal therapeutic approach, yet traditional scores like CHADS2-VASc, HATCH, and APPLE show limited predictive accuracy. Moreover, early diagnosis studies often rely on codified electronic health record (EHR) data, which may contain errors and missing information. OBJECTIVE: This study aims to predict AF recurrence between one month and two years after onset by evaluating traditional clinical scores, ML models, and our LTM approach. Moreover, another objective is to develop a methodology for integrating structured and unstructured data to enhance tabular dataset quality. METHODS: A tabular dataset was generated by combining structured clinical data with free-text discharge reports processed through natural language processing techniques, reducing errors and annotation effort. A total of 1,508 patients with documented AF onset were identified, and models were evaluated on a manually annotated test set. The proposed approach includes a LTM compared against traditional clinical scores and ML models. RESULTS: The proposed LTM approach achieved the highest predictive performance, surpassing both traditional clinical scores and ML models. Additionally, the gender and age bias analyses revealed demographic disparities. CONCLUSION: The integration of structured data and free-text sources resulted in a high-quality dataset. The findings emphasize the limitations of traditional clinical scores in predicting AF recurrence and highlight the potential of ML-based approaches, particularly our LTM model.
Interpretable Dual-Stream Learning for Local Wind Hazard Prediction in Vulnerable Communities
Nishu, Mahmuda Akhter, Huang, Chenyu, Roohi, Milad, Zhong, Xin
Wind hazards such as tornadoes and straight-line winds frequently affect vulnerable communities in the Great Plains of the United States, where limited infrastructure and sparse data coverage hinder effective emergency response. Existing forecasting systems focus primarily on meteorological elements and often fail to capture community-specific vulnerabilities, limiting their utility for localized risk assessment and resilience planning. To address this gap, we propose an interpretable dual-stream learning framework that integrates structured numerical weather data with unstructured textual event narratives. Our architecture combines a Random Forest and RoBERTa-based transformer through a late fusion mechanism, enabling robust and context-aware wind hazard prediction. The system is tailored for underserved tribal communities and supports block-level risk assessment. Experimental results show significant performance gains over traditional baselines. Furthermore, gradient-based sensitivity and ablation studies provide insight into the model's decision-making process, enhancing transparency and operational trust. The findings demonstrate both predictive effectiveness and practical value in supporting emergency preparedness and advancing community resilience.
Pairwise Evaluation of Accent Similarity in Speech Synthesis
Zhong, Jinzuomu, Liu, Suyuan, Wells, Dan, Richmond, Korin
Despite growing interest in generating high-fidelity accents, evaluating accent similarity in speech synthesis has been un-derexplored. We aim to enhance both subjective and objective evaluation methods for accent similarity. Subjectively, we refine the XAB listening test by adding components that achieve higher statistical significance with fewer listeners and lower costs. Our method involves providing listeners with transcriptions, having them highlight perceived accent differences, and implementing meticulous screening for reliability. Objectively, we utilise pronunciation-related metrics, based on distances between vowel formants and phonetic posteriorgrams, to evaluate accent generation. Comparative experiments reveal that these metrics, alongside accent similarity, speaker similarity, and Mel Cepstral Distortion, can be used. Moreover, our findings underscore significant limitations of common metrics like Word Error Rate in assessing underrepresented accents.
Explaining Unreliable Perception in Automated Driving: A Fuzzy-based Monitoring Approach
Salvi, Aniket, Weiss, Gereon, Trapp, Mario
Autonomous systems that rely on Machine Learning (ML) utilize online fault tolerance mechanisms, such as runtime monitors, to detect ML prediction errors and maintain safety during operation. However, the lack of human-interpretable explanations for these errors can hinder the creation of strong assurances about the system's safety and reliability. This paper introduces a novel fuzzy-based monitor tailored for ML perception components. It provides human-interpretable explanations about how different operating conditions affect the reliability of perception components and also functions as a runtime safety monitor. We evaluated our proposed monitor using naturalistic driving datasets as part of an automated driving case study. The interpretability of the monitor was evaluated and we identified a set of operating conditions in which the perception component performs reliably. Additionally, we created an assurance case that links unit-level evidence of \textit{correct} ML operation to system-level \textit{safety}. The benchmarking demonstrated that our monitor achieved a better increase in safety (i.e., absence of hazardous situations) while maintaining availability (i.e., ability to perform the mission) compared to state-of-the-art runtime ML monitors in the evaluated dataset.
Algorithmic Hiring and Diversity: Reducing Human-Algorithm Similarity for Better Outcomes
Parasurama, Prasanna, Ipeirotis, Panos
Algorithmic tools are increasingly used in hiring to improve fairness and diversity, often by enforcing constraints such as gender-balanced candidate shortlists. However, we show theoretically and empirically that enforcing equal representation at the shortlist stage does not necessarily translate into more diverse final hires, even when there is no gender bias in the hiring stage. We identify a crucial factor influencing this outcome: the correlation between the algorithm's screening criteria and the human hiring manager's evaluation criteria -- higher correlation leads to lower diversity in final hires. Using a large-scale empirical analysis of nearly 800,000 job applications across multiple technology firms, we find that enforcing equal shortlists yields limited improvements in hire diversity when the algorithmic screening closely mirrors the hiring manager's preferences. We propose a complementary algorithmic approach designed explicitly to diversify shortlists by selecting candidates likely to be overlooked by managers, yet still competitive according to their evaluation criteria. Empirical simulations show that this approach significantly enhances gender diversity in final hires without substantially compromising hire quality. These findings highlight the importance of algorithmic design choices in achieving organizational diversity goals and provide actionable guidance for practitioners implementing fairness-oriented hiring algorithms.
Federated learning in low-resource settings: A chest imaging study in Africa -- Challenges and lessons learned
Fabila, Jorge, Garrucho, Lidia, Campello, Víctor M., Martín-Isla, Carlos, Lekadir, Karim
This study explores the use of Federated Learning (FL) for tuberculosis (TB) diagnosis using chest X-rays in low-resource settings across Africa. FL allows hospitals to collaboratively train AI models without sharing raw patient data, addressing privacy concerns and data scarcity that hinder traditional centralized models. The research involved hospitals and research centers in eight African countries. Most sites used local datasets, while Ghana and The Gambia used public ones. The study compared locally trained models with a federated model built across all institutions to evaluate FL's real-world feasibility. Despite its promise, implementing FL in sub-Saharan Africa faces challenges such as poor infrastructure, unreliable internet, limited digital literacy, and weak AI regulations. Some institutions were also reluctant to share model updates due to data control concerns. In conclusion, FL shows strong potential for enabling AI-driven healthcare in underserved regions, but broader adoption will require improvements in infrastructure, education, and regulatory support.