Performance Analysis
Learning from Sparse Point Labels for Dense Carcinosis Localization in Advanced Ovarian Cancer Assessment
Zarin, Farahdiba, Oliva, Riccardo, Srivastav, Vinkle, Vardazaryan, Armine, Rosati, Andrea, Faustini, Alice Zampolini, Scambia, Giovanni, Fagotti, Anna, Mascagni, Pietro, Padoy, Nicolas
Learning from sparse labels is a challenge commonplace in the medical domain. This is due to numerous factors, such as annotation cost, and is especially true for newly introduced tasks. When dense pixel-level annotations are needed, this becomes even more unfeasible. However, being able to learn from just a few annotations at the pixel-level, while extremely difficult and underutilized, can drive progress in studies where perfect annotations are not immediately available. This work tackles the challenge of learning the dense prediction task of keypoint localization from a few point annotations in the context of 2d carcinosis keypoint localization from laparoscopic video frames for diagnostic planning of advanced ovarian cancer patients. To enable this, we formulate the problem as a sparse heatmap regression from a few point annotations per image and propose a new loss function, called Crag and Tail loss, for efficient learning. Our proposed loss function effectively leverages positive sparse labels while minimizing the impact of false negatives or missed annotations. Through an extensive ablation study, we demonstrate the effectiveness of our approach in achieving accurate dense localization of carcinosis keypoints, highlighting its potential to advance research in scenarios where dense annotations are challenging to obtain.
EA: An Event Autoencoder for High-Speed Vision Sensing
Islam, Riadul, Mulé, Joey, Challagundla, Dhandeep, Rizvi, Shahmir, Carson, Sean
High-speed vision sensing is essential for real-time perception in applications such as robotics, autonomous vehicles, and industrial automation. Traditional frame-based vision systems suffer from motion blur, high latency, and redundant data processing, limiting their performance in dynamic environments. Event cameras, which capture asynchronous brightness changes at the pixel level, offer a promising alternative but pose challenges in object detection due to sparse and noisy event streams. To address this, we propose an event autoencoder architecture that efficiently compresses and reconstructs event data while preserving critical spatial and temporal features. The proposed model employs convolutional encoding and incorporates adaptive threshold selection and a lightweight classifier to enhance recognition accuracy while reducing computational complexity. Experimental results on the existing Smart Event Face Dataset (SEFD) demonstrate that our approach achieves comparable accuracy to the YOLO-v4 model while utilizing up to $35.5\times$ fewer parameters. Implementations on embedded platforms, including Raspberry Pi 4B and NVIDIA Jetson Nano, show high frame rates ranging from 8 FPS up to 44.8 FPS. The proposed classifier exhibits up to 87.84x better FPS than the state-of-the-art and significantly improves event-based vision performance, making it ideal for low-power, high-speed applications in real-time edge computing.
eegFloss: A Python package for refining sleep EEG recordings using machine learning models
Sikder, Niloy, Zerr, Paul, Esfahani, Mahdad Jafarzadeh, Dresler, Martin, Krauledat, Matthias
Electroencephalography (EEG) allows monitoring of brain activity, providing insights into the functional dynamics of various brain regions and their roles in cognitive processes. EEG is a cornerstone in sleep research, serving as the primary modality of polysomnography, the gold standard in the field. However, EEG signals are prone to artifacts caused by both internal (device-specific) factors and external (environmental) interferences. As sleep studies are becoming larger, most rely on automatic sleep staging, a process highly susceptible to artifacts, leading to erroneous sleep scores. This paper addresses this challenge by introducing eegFloss, an open-source Python package to utilize eegUsability, a novel machine learning (ML) model designed to detect segments with artifacts in sleep EEG recordings. eegUsability has been trained and evaluated on manually artifact-labeled EEG data collected from 15 participants over 127 nights using the Zmax headband. It demonstrates solid overall classification performance (F1-score is approximately 0.85, Cohens kappa is 0.78), achieving a high recall rate of approximately 94% in identifying channel-wise usable EEG data, and extends beyond Zmax. Additionally, eegFloss offers features such as automatic time-in-bed detection using another ML model named eegMobility, filtering out certain artifacts, and generating hypnograms and sleep statistics. By addressing a fundamental challenge faced by most sleep studies, eegFloss can enhance the precision and rigor of their analysis as well as the accuracy and reliability of their outcomes.
Jolting Technologies: Superexponential Acceleration in AI Capabilities and Implications for AGI
This paper investigates the Jolting Technologies Hypothesis, which posits superexponential growth (increasing acceleration, or a positive third derivative) in the development of AI capabilities. We develop a theoretical framework and validate detection methodologies through Monte Carlo simulations, while acknowledging that empirical validation awaits suitable longitudinal data. Our analysis focuses on creating robust tools for future empirical studies and exploring the potential implications should the hypothesis prove valid. The study examines how factors such as shrinking idea-to-action intervals and compounding iterative AI improvements drive this jolting pattern. By formalizing jolt dynamics and validating detection methods through simulation, this work provides the mathematical foundation necessary for understanding potential AI trajectories and their consequences for AGI emergence, offering insights for research and policy.
PackHero: A Scalable Graph-based Approach for Efficient Packer Identification
Di Gennaro, Marco, D'Onghia, Mario, Polino, Mario, Zanero, Stefano, Carminati, Michele
Existing packer identifiers have significant limitations: signature-based methods lack flexibility and struggle against dynamic evasion, while Machine Learning approaches require extensive training data, limiting scalability and adaptability. Consequently, achieving accurate and adaptable packer identification remains an open problem. This paper presents PackHero, a scalable and efficient methodology for identifying packers using a novel static approach. PackHero employs a Graph Matching Network and clustering to match and group Call Graphs from programs packed with known packers. We evaluate our approach on a public dataset of malware and benign samples packed with various packers, demonstrating its effectiveness and scalability across varying sample sizes. PackHero achieves a macro-average F1-score of 93.7% with just 10 samples per packer, improving to 98.3% with 100 samples. Notably, PackHero requires fewer samples to achieve stable performance compared to other Machine Learning-based tools. Overall, PackHero matches the performance of State-of-the-art signature-based tools, outperforming them in handling Virtualization-based packers such as Themida/Winlicense, with a recall of 100%.
Few-shot text-based emotion detection
Marchitan, Teodor-George, Creanga, Claudiu, Dinu, Liviu P.
This paper describes the approach of the Unibuc - NLP team in tackling the SemEval 2025 Workshop, Task 11: Bridging the Gap in Text-Based Emotion Detection. We mainly focused on experiments using large language models (Gemini, Qwen, DeepSeek) with either few-shot prompting or fine-tuning. With our final system, for the multi-label emotion detection track (track A), we got an F1-macro of $0.7546$ (26/96 teams) for the English subset, $0.1727$ (35/36 teams) for the Portuguese (Mozambican) subset and $0.325$ (\textbf{1}/31 teams) for the Emakhuwa subset.
Robotic System with AI for Real Time Weed Detection, Canopy Aware Spraying, and Droplet Pattern Evaluation
Rasool, Inayat, Yadav, Pappu Kumar, Parmar, Amee, Mirzakhaninafchi, Hasan, Budhathoki, Rikesh, Usmani, Zain Ul Abideen, Paudel, Supriya, Olivera, Ivan Perez, Jone, Eric
Uniform and excessive herbicide application in modern agriculture contributes to increased input costs, environmental pollution, and the emergence of herbicide resistant weeds. To address these challenges, we developed a vision guided, AI-driven variable rate sprayer system capable of detecting weed presence, estimating canopy size, and dynamically adjusting nozzle activation in real time. The system integrates lightweight YOLO11n and YOLO11n-seg deep learning models, deployed on an NVIDIA Jetson Orin Nano for onboard inference, and uses an Arduino Uno-based relay interface to control solenoid actuated nozzles based on canopy segmentation results. Indoor trials were conducted using 15 potted Hibiscus rosa sinensis plants of varying canopy sizes to simulate a range of weed patch scenarios. The YOLO11n model achieved a mean average precision (mAP@50) of 0.98, with a precision of 0.99 and a recall close to 1.0. The YOLO11n-seg segmentation model achieved a mAP@50 of 0.48, precision of 0.55, and recall of 0.52. System performance was validated using water sensitive paper, which showed an average spray coverage of 24.22% in zones where canopy was present. An upward trend in mean spray coverage from 16.22% for small canopies to 21.46% and 21.65% for medium and large canopies, respectively, demonstrated the system's capability to adjust spray output based on canopy size in real time. These results highlight the potential of combining real time deep learning with low-cost embedded hardware for selective herbicide application. Future work will focus on expanding the detection capabilities to include three common weed species in South Dakota: water hemp (Amaranthus tuberculatus), kochia (Bassia scoparia), and foxtail (Setaria spp.), followed by further validation in both indoor and field trials within soybean and corn production systems.
Robust Learning on Noisy Graphs via Latent Space Constraints with External Knowledge
Gu, Chunhui, Nasr, Mohammad Sadegh, Long, James P., Do, Kim-Anh, Irajizad, Ehsan
Graph Neural Networks (GNNs) often struggle with noisy edges. We propose Latent Space Constrained Graph Neural Networks (LSC-GNN) to incorporate external "clean" links and guide embeddings of a noisy target graph. We train two encoders--one on the full graph (target plus external edges) and another on a regularization graph excluding the target's potentially noisy links--then penalize discrepancies between their latent representations. This constraint steers the model away from overfitting spurious edges. Experiments on benchmark datasets show LSC-GNN outperforms standard and noise-resilient GNNs in graphs subjected to moderate noise. We extend LSC-GNN to heterogeneous graphs and validate it on a small protein-metabolite network, where metabolite-protein interactions reduce noise in protein co-occurrence data. Our results highlight LSC-GNN's potential to boost predictive performance and interpretability in settings with noisy relational structures.
An autonomous agent for auditing and improving the reliability of clinical AI models
Kuhn, Lukas, Buettner, Florian
The deployment of AI models in clinical practice faces a critical challenge: models achieving expert-level performance on benchmarks can fail catastrophically when confronted with real-world variations in medical imaging. Minor shifts in scanner hardware, lighting or demographics can erode accuracy, but currently reliability auditing to identify such catastrophic failure cases before deployment is a bespoke and time-consuming process. Practitioners lack accessible and interpretable tools to expose and repair hidden failure modes. Here we introduce ModelAuditor, a self-reflective agent that converses with users, selects task-specific metrics, and simulates context-dependent, clinically relevant distribution shifts. ModelAuditor then generates interpretable reports explaining how much performance likely degrades during deployment, discussing specific likely failure modes and identifying root causes and mitigation strategies. Our comprehensive evaluation across three real-world clinical scenarios - inter-institutional variation in histopathology, demographic shifts in dermatology, and equipment heterogeneity in chest radiography - demonstrates that ModelAuditor is able correctly identify context-specific failure modes of state-of-the-art models such as the established SIIM-ISIC melanoma classifier. Its targeted recommendations recover 15-25% of performance lost under real-world distribution shift, substantially outperforming both baseline models and state-of-the-art augmentation methods. These improvements are achieved through a multi-agent architecture and execute on consumer hardware in under 10 minutes, costing less than US$0.50 per audit.
Estimating prevalence with precision and accuracy
Igiraneza, Aime Bienfait, Fraser, Christophe, Hinch, Robert
Unlike classification, whose goal is to estimate the class of each data point in a dataset, prevalence estimation or quantification is a task that aims to estimate the distribution of classes in a dataset. The two main tasks in prevalence estimation are to adjust for bias, due to the prevalence in the training dataset, and to quantify the uncertainty in the estimate. The standard methods used to quantify uncertainty in prevalence estimates are bootstrapping and Bayesian quantification methods. It is not clear which approach is ideal in terms of precision (i.e. the width of confidence intervals) and coverage (i.e. the confidence intervals being well-calibrated). Here, we propose Precise Quantifier (PQ), a Bayesian quantifier that is more precise than existing quantifiers and with well-calibrated coverage. We discuss the theory behind PQ and present experiments based on simulated and real-world datasets. Through these experiments, we establish the factors which influence quantification precision: the discriminatory power of the underlying classifier; the size of the labeled dataset used to train the quantifier; and the size of the unlabeled dataset for which prevalence is estimated. Our analysis provides deep insights into uncertainty quantification for quantification learning.