Performance Analysis
MedVQA-TREE: A Multimodal Reasoning and Retrieval Framework for Sarcopenia Prediction
Moradbeiki, Pardis, Ghadiri, Nasser, Zahabi, Sayed Jalal, Wiil, Uffe Kock, Brockhattingen, Kristoffer Kittelmann, Ebrahimi, Ali
Accurate sarcopenia diagnosis via ultrasound remains challenging due to subtle imaging cues, limited labeled data, and the absence of clinical context in most models. We propose MedVQA-TREE, a multimodal framework that integrates a hierarchical image interpretation module, a gated feature-level fusion mechanism, and a novel multi-hop, multi-query retrieval strategy. The vision module includes anatomical classification, region segmentation, and graph-based spatial reasoning to capture coarse, mid-level, and fine-grained structures. A gated fusion mechanism selectively integrates visual features with textual queries, while clinical knowledge is retrieved through a UMLS-guided pipeline accessing PubMed and a sarcopenia-specific external knowledge base. MedVQA-TREE was trained and evaluated on two public MedVQA datasets (VQA-RAD and PathVQA) and a custom sarcopenia ultrasound dataset. The model achieved up to 99% diagnostic accuracy and outperformed previous state-of-the-art methods by over 10%. These results underscore the benefit of combining structured visual understanding with guided knowledge retrieval for effective AI-assisted diagnosis in sarcopenia.
The Next Layer: Augmenting Foundation Models with Structure-Preserving and Attention-Guided Learning for Local Patches to Global Context Awareness in Computational Pathology
Waqas, Muhammad, Bandyopadhyay, Rukhmini, Showkatian, Eman, Muneer, Amgad, Zafar, Anas, Alvarez, Frank Rojas, Marin, Maricel Corredor, Li, Wentao, Jaffray, David, Haymaker, Cara, Heymach, John, Vokes, Natalie I, Soto, Luisa Maren Solis, Zhang, Jianjun, Wu, Jia
Foundation models have recently emerged as powerful feature extractors in computational pathology, yet they typically omit mechanisms for leveraging the global spatial structure of tissues and the local contextual relationships among diagnostically relevan t regions -- key elements for understanding the tumor microenvironment. Multiple instance learning (MIL) remains an essential next step following foundation model, designing a framework to aggregate patch - level features into slide - level predictions. We presen t EAGLE - Net, a structure - preserving, attention - guided MIL architecture designed to augment prediction and interpretability. EAGLE - Net integrates multi - scale absolute spatial encoding to capture global tissue architecture, a top - K neighborhood - aware loss to focus attention on local microenvironments, and background suppression loss to minimize false positives. We benchmarked EAGLE - Net on large pan - cancer datasets, including three cancer types for classification (10,260 slides) and seven cancer types for surv ival prediction (4,172 slides), using three distinct histology foundation backbones (REMEDIES, Uni - V1, Uni2 - h). Across tasks, EAGLE - Net achieved up to 3% higher classification accuracy and the top concordance indices in 6 of 7 cancer types, producing smoot h, biologically coherent attention maps that aligned with expert annotations and highlighted invasive fronts, necrosis, and immune infiltration. These results position EAGLE - Net as a generalizable, interpretable framework that complements foundation models, enabling improved biomarker discovery, prognostic modeling, and clinical decision support.
Large VLM-based Stylized Sports Captioning
Dhar, Sauptik, Buoncristiani, Nicholas, Anakata, Joe, Zhang, Haoyu, Munson, Michelle
The advent of large (visual) language models (LLM / LVLM) have led to a deluge of automated human-like systems in several domains including social media content generation, search and recommendation, healthcare prognosis, AI assistants for cognitive tasks etc. Although these systems have been successfully integrated in production; very little focus has been placed on sports, particularly accurate identification and natural language description of the game play. Most existing LLM/LVLMs can explain generic sports activities, but lack sufficient domain-centric sports' jargon to create natural (human-like) descriptions. This work highlights the limitations of existing SoTA LLM/LVLMs for generating production-grade sports captions from images in a desired stylized format, and proposes a two-level fine-tuned LVLM pipeline to address that. The proposed pipeline yields an improvement > 8-10% in the F1, and > 2-10% in BERT score compared to alternative approaches. In addition, it has a small runtime memory footprint and fast execution time. During Super Bowl LIX the pipeline proved its practical application for live professional sports journalism; generating highly accurate and stylized captions at the rate of 6 images per 3-5 seconds for over 1000 images during the game play.
Is data-efficient learning feasible with quantum models?
Sakhnenko, Alona, Mendl, Christian B., Lorenz, Jeanette M.
The importance of analyzing nontrivial datasets when testing quantum machine learning (QML) models is becoming increasingly prominent in literature, yet a cohesive framework for understanding dataset characteristics remains elusive. In this work, we concentrate on the size of the dataset as an indicator of its complexity and explores the potential for QML models to demonstrate superior data-efficiency compared to classical models, particularly through the lens of quantum kernel methods (QKMs). We provide a method for generating semi-artificial fully classical datasets, on which we show one of the first evidence of the existence of classical datasets where QKMs require less data during training. Additionally, our study introduces a new analytical tool to the QML domain, derived for classical kernel methods, which can be aimed at investigating the classical-quantum gap. Our empirical results reveal that QKMs can achieve low error rates with less training data compared to classical counterparts. Furthermore, our method allows for the generation of datasets with varying properties, facilitating further investigation into the characteristics of real-world datasets that may be particularly advantageous for QKMs. We also show that the predicted performance from the analytical tool we propose - a generalization metric from classical domain - show great alignment empirical evidence, which fills the gap previously existing in the field. We pave a way to a comprehensive exploration of dataset complexities, providing insights into how these complexities influence QML performance relative to traditional methods. This research contributes to a deeper understanding of the generalization benefits of QKM models and potentially a broader family of QML models, setting the stage for future advancements in the field.
Scalable, Technology-Agnostic Diagnosis and Predictive Maintenance for Point Machine using Deep Learning
Di Santi, Eduardo, Ci, Ruixiang, Lefebvre, Clรฉment, Mijatovic, Nenad, Pugnaloni, Michele, Brown, Jonathan, Martรญn, Victor, Saiah, Kenza
The Point Machine (PM) is a critical piece of railway equipment that switches train routes by diverting tracks through a switchblade. As with any critical safety equipment, a failure will halt operations leading to service disruptions; therefore, pre-emptive maintenance may avoid unnecessary interruptions by detecting anomalies before they become failures. Previous work relies on several inputs and crafting custom features by segmenting the signal. This not only adds additional requirements for data collection and processing, but it is also specific to the PM technology, the installed locations and operational conditions limiting scalability. Based on the available maintenance records, the main failure causes for PM are obstacles, friction, power source issues and misalignment. Those failures affect the energy consumption pattern of PMs, altering the usual (or healthy) shape of the power signal during the PM movement. In contrast to the current state-of-the-art, our method requires only one input. We apply a deep learning model to the power signal pattern to classify if the PM is nominal or associated with any failure type, achieving >99.99\% precision, <0.01\% false positives and negligible false negatives. Our methodology is generic and technology-agnostic, proven to be scalable on several electromechanical PM types deployed in both real-world and test bench environments. Finally, by using conformal prediction the maintainer gets a clear indication of the certainty of the system outputs, adding a confidence layer to operations and making the method compliant with the ISO-17359 standard.
Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery
Lan, Jing, Ding, Hexiao, Chen, Hongzhao, Jiang, Yufeng, Ng, Nga-Chun, Cheng, Gerald W. Y., Li, Zongxi, Cai, Jing, Lin, Liang-ting, Yoo, Jung Sun
Accurate prediction of protein-ligand interactions is essential for computer-aided drug discovery. However, existing methods often fail to capture solvent-dependent conformational changes and lack the ability to jointly learn multiple related tasks. To address these limitations, we introduce a pre-training method that incorporates ligand conformational ensembles generated under diverse solvent conditions as augmented input. This design enables the model to learn both structural flexibility and environmental context in a unified manner. The training process integrates molecular reconstruction to capture local geometry, interatomic distance prediction to model spatial relationships, and contrastive learning to build solvent-invariant molecular representations. Together, these components lead to significant improvements, including a 3.7% gain in binding affinity prediction, an 82% success rate on the PoseBusters Astex docking benchmarks, and an area under the curve of 97.1% in virtual screening. The framework supports solvent-aware, multi-task modeling and produces consistent results across benchmarks. A case study further demonstrates sub-angstrom docking accuracy with a root-mean-square deviation of 0.157 angstroms, offering atomic-level insight into binding mechanisms and advancing structure-based drug design.
Comparing Cluster-Based Cross-Validation Strategies for Machine Learning Model Evaluation
Spezia, Afonso Martini, Fontanari, Thomas, Recamonde-Mendoza, Mariana
Cross-validation plays a fundamental role in Machine Learning, enabling robust evaluation of model performance and preventing overestimation on training and validation data. However, one of its drawbacks is the potential to create data subsets (folds) that do not adequately represent the diversity of the original dataset, which can lead to biased performance estimates. The objective of this work is to deepen the investigation of cluster-based cross-validation strategies by analyzing the performance of different clustering algorithms through experimental comparison. Additionally, a new cross-validation technique that combines Mini Batch K-Means with class stratification is proposed. Experiments were conducted on 20 datasets (both balanced and imbalanced) using four supervised learning algorithms, comparing cross-validation strategies in terms of bias, variance, and computational cost. The technique that uses Mini Batch K-Means with class stratification outperformed others in terms of bias and variance on balanced datasets, though it did not significantly reduce computational cost. On imbalanced datasets, traditional stratified cross-validation consistently performed better, showing lower bias, variance, and computational cost, making it a safe choice for performance evaluation in scenarios with class imbalance. In the comparison of different clustering algorithms, no single algorithm consistently stood out as superior. Overall, this work contributes to improving predictive model evaluation strategies by providing a deeper understanding of the potential of cluster-based data splitting techniques and reaffirming the effectiveness of well-established strategies like stratified cross-validation. Moreover, it highlights perspectives for increasing the robustness and reliability of model evaluations, especially in datasets with clustering characteristics.
Bootstrapping Learned Cost Models with Synthetic SQL Queries
Nidd, Michael, Miksovic, Christoph, Gschwind, Thomas, Fusco, Francesco, Giovannini, Andrea, Giurgiu, Ioana
Having access to realistic workloads for a given database instance is extremely important to enable stress and vulnerability testing, as well as to optimize for cost and performance. Recent advances in learned cost models have shown that when enough diverse SQL queries are available, one can effectively and efficiently predict the cost of running a given query against a specific database engine. In this paper, we describe our experience in exploiting modern synthetic data generation techniques, inspired by the generative AI and LLM community, to create high-quality datasets enabling the effective training of such learned cost models. Initial results show that we can improve a learned cost model's predictive accuracy by training it with 45% fewer queries than when using competitive generation approaches.
A bag of tricks for real-time Mitotic Figure detection
Marzahl, Christian, Napora, Brian
Mitotic figure (MF) detection in histopathology images is challenging due to large variations in slide scanners, staining protocols, tissue types, and the presence of artifacts. This paper presents a collection of training techniques - a bag of tricks - that enable robust, real-time MF detection across diverse domains. We build on the efficient RTMDet single stage object detector to achieve high inference speed suitable for clinical deployment. Our method addresses scanner variability and tumor heterogeneity via extensive multi-domain training data, balanced sampling, and careful augmentation. Additionally, we employ targeted, hard negative mining on necrotic and debris tissue to reduce false positives. In a grouped 5-fold cross-validation across multiple MF datasets, our model achieves an F1 score between 0.78 and 0.84. On the preliminary test set of the MItosis DOmain Generalization (MIDOG) 2025 challenge, our single-stage RTMDet-S based approach reaches an F1 of 0.81, outperforming larger models and demonstrating adaptability to new, unfamiliar domains. The proposed solution offers a practical trade-off between accuracy and speed, making it attractive for real-world clinical adoption.
Context-Aware Risk Estimation in Home Environments: A Probabilistic Framework for Service Robots
Ishii, Sena, Chikhalikar, Akash, Ravankar, Ankit A., Luces, Jose Victorio Salazar, Hirata, Yasuhisa
-- We present a novel framework for estimating accident-prone regions in everyday indoor scenes, aimed at improving real-time risk awareness in service robots operating in human-centric environments. As robots become integrated into daily life, particularly in homes, the ability to anticipate and respond to environmental hazards is crucial for ensuring user safety, trust, and effective human-robot interaction. Each object is represented as a node with an associated risk score, and risk propagates asymmetrically from high-risk to low-risk objects based on spatial proximity and accident relationship. This enables the robot to infer potential hazards even when they are not explicitly visible or labeled. Designed for interpretability and lightweight onboard deployment, our method is validated on a dataset with human-annotated risk regions, achieving a binary risk detection accuracy of 75%. The system demonstrates strong alignment with human perception, particularly in scenes involving sharp or unstable objects. These results underline the potential of context-aware risk reasoning to enhance robotic scene understanding and proactive safety behaviors in shared human-robot spaces. This framework could serve as a foundation for future systems that make context-driven safety decisions, provide real-time alerts, or autonomously assist users in avoiding or mitigating hazards within home environments. As service robots become increasingly integrated into daily life--supporting tasks such as cleaning, acting as communication companions, searching for objects or navigating shared spaces [1]-[3]--their roles are expected to expand beyond single-function behaviors. With recent advances in embodied intelligence and large language models, these robots are beginning to understand complex instructions and act autonomously in diverse home environments.