Performance Analysis
No Label Left Behind: A Unified Surface Defect Detection Model for all Supervision Regimes
Rolih, Blaž, Fučka, Matic, Skočaj, Danijel
Surface defect detection is a critical task across numerous industries, aimed at efficiently identifying and localising imperfections or irregularities on manufactured components. While numerous methods have been proposed, many fail to meet industrial demands for high performance, efficiency, and adaptability. Existing approaches are often constrained to specific supervision scenarios and struggle to adapt to the diverse data annotations encountered in real-world manufacturing processes, such as unsupervised, weakly supervised, mixed supervision, and fully supervised settings. To address these challenges, we propose SuperSimpleNet, a highly efficient and adaptable discriminative model built on the foundation of SimpleNet. SuperSimpleNet incorporates a novel synthetic anomaly generation process, an enhanced classification head, and an improved learning procedure, enabling efficient training in all four supervision scenarios, making it the first model capable of fully leveraging all available data annotations. SuperSimpleNet sets a new standard for performance across all scenarios, as demonstrated by its results on four challenging benchmark datasets. Beyond accuracy, it is very fast, achieving an inference time below 10 ms. With its ability to unify diverse supervision paradigms while maintaining outstanding speed and reliability, SuperSimpleNet represents a promising step forward in addressing real-world manufacturing challenges and bridging the gap between academic research and industrial applications. Code: https://github.com/blaz-r/SuperSimpleNet
CLAIRE-DSA: Fluoroscopic Image Classification for Quality Assurance of Computer Vision Pipelines in Acute Ischemic Stroke
Berg, Cristo J. van den, Nijenhuis, Frank G. te, Blaauboer, Mirre J., van Erp, Daan T. W., Keppels, Carlijn M., van der Sluijs, Matthijs, Roozenbeek, Bob, van Zwam, Wim, Cornelissen, Sandra, Ruijters, Danny, Su, Ruisheng, van Walsum, Theo
Computer vision models can be used to assist during mechanical thrombectomy (MT) for acute ischemic stroke (AIS), but poor image quality often degrades performance. This work presents CLAIRE-DSA, a deep learning--based framework designed to categorize key image properties in minimum intensity projections (MinIPs) acquired during MT for AIS, supporting downstream quality control and workflow optimization. CLAIRE-DSA uses pre-trained ResNet backbone models, fine-tuned to predict nine image properties (e.g., presence of contrast, projection angle, motion artefact severity). Separate classifiers were trained on an annotated dataset containing $1,758$ fluoroscopic MinIPs. The model achieved excellent performance on all labels, with ROC-AUC ranging from $0.91$ to $0.98$, and precision ranging from $0.70$ to $1.00$. The ability of CLAIRE-DSA to identify suitable images was evaluated on a segmentation task by filtering poor quality images and comparing segmentation performance on filtered and unfiltered datasets. Segmentation success rate increased from $42%$ to $69%$, $p < 0.001$. CLAIRE-DSA demonstrates strong potential as an automated tool for accurately classifying image properties in DSA series of acute ischemic stroke patients, supporting image annotation and quality control in clinical and research applications. Source code is available at https://gitlab.com/icai-stroke-lab/wp3_neurointerventional_ai/claire-dsa.
Operationalizing Automated Essay Scoring: A Human-Aware Approach
This paper explores the human-centric operationalization of Automated Essay Scoring (AES) systems, addressing aspects beyond accuracy. We compare various machine learning-based approaches with Large Language Models (LLMs) approaches, identifying their strengths, similarities and differences. The study investigates key dimensions such as bias, robustness, and explainability, considered important for human-aware operationalization of AES systems. Our study shows that ML-based AES models outperform LLMs in accuracy but struggle with explainability, whereas LLMs provide richer explanations. We also found that both approaches struggle with bias and robustness to edge scores. By analyzing these dimensions, the paper aims to identify challenges and trade-offs between different methods, contributing to more reliable and trustworthy AES methods.
Context-aware deep learning using individualized prior information reduces false positives in disease risk prediction and longitudinal health assessment
Umapathy, Lavanya, Johnson, Patricia M, Dutt, Tarun, Tong, Angela, Nayan, Madhur, Chandarana, Hersh, Sodickson, Daniel K
Temporal context in medicine is valuable in assessing key changes in patient health over time. We developed a machine learning framework to integrate diverse context from prior visits to improve health monitoring, especially when prior visits are limited and their frequency is variable. Our model first estimates initial risk of disease using medical data from the most recent patient visit, then refines this assessment using information digested from previously collected imaging and/or clinical biomarkers. We applied our framework to prostate cancer (PCa) risk prediction using data from a large population (28,342 patients, 39,013 magnetic resonance imaging scans, 68,931 blood tests) collected over nearly a decade. For predictions of the risk of clinically significant PCa at the time of the visit, integrating prior context directly converted false positives to true negatives, increasing overall specificity while preserving high sensitivity. False positive rates were reduced progressively from 51% to 33% when integrating information from up to three prior imaging examinations, as compared to using data from a single visit, and were further reduced to 24% when also including additional context from prior clinical data. For predicting the risk of PCa within five years of the visit, incorporating prior context reduced false positive rates still further (64% to 9%). Our findings show that information collected over time provides relevant context to enhance the specificity of medical risk prediction. For a wide range of progressive conditions, sufficient reduction of false positive rates using context could offer a pathway to expand longitudinal health monitoring programs to large populations with comparatively low baseline risk of disease, leading to earlier detection and improved health outcomes.
Hypergraph Contrastive Sensor Fusion for Multimodal Fault Diagnosis in Induction Motors
Ali, Usman, Zia, Ali, Ali, Waqas, Ramzan, Umer, Rehman, Abdul, Chaudhry, Muhammad Tayyab, Xiang, Wei
Abstract--Reliable induction motor (IM) fault diagnosis is vital for industrial safety and operational continuity, mitigating costly unplanned downtime. Conventional approaches often struggle to capture complex multimodal signal relationships, are constrained to unimodal data or single fault types, and exhibit performance degradation under noisy or cross-domain conditions. This paper proposes the Multimodal Hypergraph Contrastive Attention Network (MM-HCAN), a unified framework for robust fault diagnosis. T o the best of our knowledge, MM-HCAN is the first to integrate contrastive learning within a hypergraph topology specifically designed for multimodal sensor fusion, enabling the joint modelling of intra-and inter-modal dependencies and enhancing generalisation beyond Euclidean embedding spaces. Evaluated on three real-world benchmarks, MM-HCAN achieves up to 99.82% accuracy with strong cross-domain generalisation and resilience to noise, demonstrating its suitability for real-world deployment. MM-HCAN provides a scalable and robust solution for comprehensive multi-fault diagnosis, supporting predictive maintenance and extended asset longevity in industrial environments. NDUCTION motors (IMs) are essential to modern industrial systems, supporting sectors like manufacturing, energy, and transportation. However, faults in IMs can cause downtime, high maintenance costs, and substantial economic losses. As a result, fault diagnosis in IMs has become a focal point of research, with recent studies highlighting its importance in enhancing operational resilience and minimising financial impacts. IMs faults are broadly classified as either electrical, with stator faults comprising 28-36%, or mechanical, encompassing bearing (42-55%) and rotor (8-10%) failures [1].
An Empirical Study on MC Dropout--Based Uncertainty--Error Correlation in 2D Brain Tumor Segmentation
Accurate brain tumor segmentation from MRI is vital for diagnosis and treatment planning. Although Monte Carlo (MC) Dropout is widely used to estimate model uncertainty, its effectiveness in identifying segmentation errors -- especially near tumor boundaries -- remains unclear. This study empirically examines the relationship between MC Dropout--based uncertainty and segmentation error in 2D brain tumor MRI segmentation using a U-Net trained under four augmentation settings: none, horizontal flip, rotation, and scaling. Uncertainty was computed from 50 stochastic forward passes and correlated with pixel-wise errors using Pearson and Spearman coefficients. Results show weak global correlations ($r \approx 0.30$--$0.38$) and negligible boundary correlations ($|r| < 0.05$). Although differences across augmentations were statistically significant ($p < 0.001$), they lacked practical relevance. These findings suggest that MC Dropout uncertainty provides limited cues for boundary error localization, underscoring the need for alternative or hybrid uncertainty estimation methods in medical image segmentation.
From Checklists to Clusters: A Homeostatic Account of AGI Evaluation
Contemporary AGI evaluations report multidomain capability profiles, yet they typically assign symmetric weights and rely on snapshot scores. This creates two problems: (i) equal weighting treats all domains as equally important when human intelligence research suggests otherwise, and (ii) snapshot testing can't distinguish durable capabilities from brittle performances that collapse under delay or stress. I argue that general intelligence -- in humans and potentially in machines -- is better understood as a homeostatic property cluster: a set of abilities plus the mechanisms that keep those abilities co-present under perturbation. On this view, AGI evaluation should weight domains by their causal centrality (their contribution to cluster stability) and require evidence of persistence across sessions. I propose two battery-compatible extensions: a centrality-prior score that imports CHC-derived weights with transparent sensitivity analysis, and a Cluster Stability Index family that separates profile persistence, durable learning, and error correction. These additions preserve multidomain breadth while reducing brittleness and gaming. I close with testable predictions and black-box protocols labs can adopt without architectural access.
WELD: A Large-Scale Longitudinal Dataset of Emotional Dynamics for Ubiquitous Affective Computing
Automated emotion recognition in real-world workplace settings remains a challenging problem in affective computing due to the scarcity of large-scale, longitudinal datasets collected in naturalistic environments. We present a novel dataset comprising 733,651 facial expression records from 38 employees collected over 30.5 months (November 2021 to May 2024) in an authentic office environment. Each record contains seven emotion probabilities (neutral, happy, sad, surprised, fear, disgusted, angry) derived from deep learning-based facial expression recognition, along with comprehensive metadata including job roles, employment outcomes, and personality traits. The dataset uniquely spans the COVID-19 pandemic period, capturing emotional responses to major societal events including the Shanghai lockdown and policy changes. We provide 32 extended emotional metrics computed using established affective science methods, including valence, arousal, volatility, predictability, inertia, and emotional contagion strength. Technical validation demonstrates high data quality through successful replication of known psychological patterns (weekend effect: +192% valence improvement, p < 0.001; diurnal rhythm validated) and perfect predictive validity for employee turnover (AUC=1.0). Baseline experiments using Random Forest and LSTM models achieve 91.2% accuracy for emotion classification and R2 = 0.84 for valence prediction. This is the largest and longest longitudinal workplace emotion dataset publicly available, enabling research in emotion recognition, affective dynamics modeling, emotional contagion, turnover prediction, and emotion-aware system design.
TangledFeatures: Robust Feature Selection in Highly Correlated Spaces
Feature selection is a fundamental step in model development, shaping both predictive performance and interpretability. Y et, most widely used methods focus on predictive accuracy, and their performance degrades in the presence of correlated predictors. To address this gap, we introduce TangledFeatures, a framework for feature selection in correlated feature spaces. It identifies representative features from groups of entangled predictors, reducing redundancy while retaining explanatory power. The resulting feature subset can be directly applied in downstream models, offering a more interpretable and stable basis for analysis compared to traditional selection techniques. We demonstrate the effectiveness of TangledFeatures on Alanine Dipeptide, applying it to the prediction of backbone torsional angles ϕ and ψ, and show that the selected features correspond to structurally meaningful intra-atomic distances that explain variation in these angles.
Evaluation and Implementation of Machine Learning Algorithms to Predict Early Detection of Kidney and Heart Disease in Diabetic Patients
Cardiovascular disease and chronic kidney disease are major complications of diabetes, leading to high morbidity and mortality. Early detection of these conditions is critical, yet traditional diagnostic markers often lack sensitivity in the initial stages. This study integrates conventional statistical methods with machine learning approaches to improve early diagnosis of CKD and CVD in diabetic patients. Descriptive and inferential statistics were computed in SPSS to explore associations between diseases and clinical or demographic factors. Patients were categorized into four groups: Group A both CKD and CVD, Group B CKD only, Group C CVD only, and Group D no disease. Statistical analysis revealed significant correlations: Serum Creatinine and Hypertension with CKD, and Cholesterol, Triglycerides, Myocardial Infarction, Stroke, and Hypertension with CVD. These results guided the selection of predictive features for machine learning models. Logistic Regression, Support Vector Machine, and Random Forest algorithms were implemented, with Random Forest showing the highest accuracy, particularly for CKD prediction. Ensemble models outperformed single classifiers in identifying high-risk diabetic patients. SPSS results further validated the significance of the key parameters integrated into the models. While challenges such as interpretability and class imbalance remain, this hybrid statistical machine learning framework offers a promising advancement toward early detection and risk stratification of diabetic complications compared to conventional diagnostic approaches.