Performance Analysis
Hybrid Ensemble of Segmentation-Assisted Classification and GBDT for Skin Cancer Detection with Engineered Metadata and Synthetic Lesions from ISIC 2024 Non-Dermoscopic 3D-TBP Images
Hasan, Muhammad Zubair, Rifat, Fahmida Yasmin
Skin cancer is among the most prevalent and life-threatening diseases worldwide, with early detection being critical to patient outcomes. This work presents a hybrid machine and deep learning-based approach for classifying malignant and benign skin lesions using the SLICE-3D dataset from ISIC 2024, which comprises 401,059 cropped lesion images extracted from 3D Total Body Photography (TBP), emulating non-dermoscopic, smartphone-like conditions. Our method combines vision transformers (EVA02) and our designed convolutional ViT hybrid (EdgeNeXtSAC) to extract robust features, employing a segmentation-assisted classification pipeline to enhance lesion localization. Predictions from these models are fused with a gradient-boosted decision tree (GBDT) ensemble enriched by engineered features and patient-specific relational metrics. To address class imbalance and improve generalization, we augment malignant cases with Stable Diffusion-generated synthetic lesions and apply a diagnosis-informed relabeling strategy to harmonize external datasets into a 3-class format. Using partial AUC (pAUC) above 80 percent true positive rate (TPR) as the evaluation metric, our approach achieves a pAUC of 0.1755 -- the highest among all configurations. These results underscore the potential of hybrid, interpretable AI systems for skin cancer triage in telemedicine and resource-constrained settings.
A Foundation Model for Spatial Proteomics
Shaban, Muhammad, Chang, Yuzhou, Qiu, Huaying, Yeo, Yao Yu, Song, Andrew H., Jaume, Guillaume, Wang, Yuchen, Weishaupt, Luca L., Ding, Tong, Vaidya, Anurag, Lamane, Abdallah, Shao, Daniel, Zidane, Mohammed, Bai, Yunhao, McCallum, Paige, Luo, Shuli, Wu, Wenrui, Wang, Yang, Cramer, Precious, Chan, Chi Ngai, Stephan, Pierre, Schaffenrath, Johanna, Lee, Jia Le, Michel, Hendrik A., Tian, Caiwei, Almagro-Perez, Cristina, Wagner, Sophia J., Sahai, Sharifa, Lu, Ming Y., Chen, Richard J., Zhang, Andrew, Gonzales, Mark Edward M., Makky, Ahmad, Lee, Jia-Ying Joey, Cheng, Hao, Ahmar, Nourhan El, Matar, Sayed, Haist, Maximilian, Phillips, Darci, Tan, Yuqi, Nolan, Garry P., Burack, W. Richard, Estes, Jacob D., Liu, Jonathan T. C., Choueiri, Toni K, Agarwal, Neeraj, Barry, Marc, Rodig, Scott J., Le, Long Phi, Gerber, Georg, Schürch, Christian M., Theis, Fabian J., Kim, Youn H, Yeong, Joe, Signoretti, Sabina, Howitt, Brooke E., Loo, Lit-Hsin, Ma, Qin, Jiang, Sizun, Mahmood, Faisal
Foundation models have begun to transform image analysis by acting as pretrained generalist backbones that can be adapted to many tasks even when post-training data are limited, yet their impact on spatial proteomics, imaging that maps proteins at single-cell resolution, remains limited. Here, we introduce KRONOS, a foundation model built for spatial proteomics. KRONOS was trained in a self-supervised manner on over 47 million image patches covering 175 protein markers, 16 tissue types, and 8 fluorescence-based imaging platforms. We introduce key architectural adaptations to address the high-dimensional, multi-channel, and heterogeneous nature of multiplex imaging. We demonstrate that KRONOS learns biologically meaningful representations across multiple scales, ranging from cellular and microenvironment to tissue levels, enabling it to address diverse downstream tasks, including cell phenotyping, region classification, and patient stratification. Evaluated across 11 independent cohorts, KRONOS achieves state-of-the-art performance across cell phenotyping, treatment response prediction, and retrieval tasks, and is highly data-efficient. KRONOS also introduces the paradigm of segmentation-free patch-level processing for efficient and scalable spatial proteomics analysis, allowing cross-institutional comparisons, and as an image reverse search engine for spatial patterns.
Cross-Platform Violence Detection on Social Media: A Dataset and Analysis
Chen, Celia, Beland, Scotty, Burghardt, Ingo, Byczek, Jill, Conway, William J., Cotugno, Eric, Davre, Sadaf, Fletcher, Megan, Gnanasekaran, Rajesh Kumar, Hamilton, Kristin, Harbert, Marilyn, Heustis, Jordan, Jha, Tanaya, Klein, Emily, Kramer, Hayden, Leitch, Alex, Perkins, Jessica, Sherman, Casi, Sterrn, Celia, Stevens, Logan, Zarrella, Rebecca, Golbeck, Jennifer
Violent threats remain a significant problem across social media platforms. Useful, high-quality data facilitates research into the understanding and detection of malicious content, including violence. In this paper, we introduce a cross-platform dataset of 30,000 posts hand-coded for violent threats and sub-types of violence, including political and sexual violence. To evaluate the signal present in this dataset, we perform a machine learning analysis with an existing dataset of violent comments from YouTube. We find that, despite originating from different platforms and using different coding criteria, we achieve high classification accuracy both by training on one dataset and testing on the other, and in a merged dataset condition. These results have implications for content-classification strategies and for understanding violent content across social media.
Predicting Postoperative Stroke in Elderly SICU Patients: An Interpretable Machine Learning Model Using MIMIC Data
Li, Tinghuan, Chen, Shuheng, Fan, Junyi, Pishgar, Elham, Alaei, Kamiar, Placencia, Greg, Pishgar, Maryam
Postoperative stroke remains a critical complication in elderly surgical intensive care unit (SICU) patients, contributing to prolonged hospitalization, elevated healthcare costs, and increased mortality. Accurate early risk stratification is essential to enable timely intervention and improve clinical outcomes. We constructed a combined cohort of 19,085 elderly SICU admissions from the MIMIC-III and MIMIC-IV databases and developed an interpretable machine learning (ML) framework to predict in-hospital stroke using clinical data from the first 24 hours of Intensive Care Unit (ICU) stay. The preprocessing pipeline included removal of high-missingness features, iterative Singular Value Decomposition (SVD) imputation, z-score normalization, one-hot encoding, and class imbalance correction via the Adaptive Synthetic Sampling (ADASYN) algorithm. A two-stage feature selection process-combining Recursive Feature Elimination with Cross-Validation (RFECV) and SHapley Additive exPlanations (SHAP)-reduced the initial 80 variables to 20 clinically informative predictors. Among eight ML models evaluated, CatBoost achieved the best performance with an AUROC of 0.8868 (95% CI: 0.8802--0.8937). SHAP analysis and ablation studies identified prior cerebrovascular disease, serum creatinine, and systolic blood pressure as the most influential risk factors. Our results highlight the potential of interpretable ML approaches to support early detection of postoperative stroke and inform decision-making in perioperative critical care.
Vid-SME: Membership Inference Attacks against Large Video Understanding Models
Li, Qi, Yu, Runpeng, Wang, Xinchao
Multimodal large language models (MLLMs) demonstrate remarkable capabilities in handling complex multimodal tasks and are increasingly adopted in video understanding applications. However, their rapid advancement raises serious data privacy concerns, particularly given the potential inclusion of sensitive video content, such as personal recordings and surveillance footage, in their training datasets. Determining improperly used videos during training remains a critical and unresolved challenge. Despite considerable progress on membership inference attacks (MIAs) for text and image data in MLLMs, existing methods fail to generalize effectively to the video domain. These methods suffer from poor scalability as more frames are sampled and generally achieve negligible true positive rates at low false positive rates (TPR@Low FPR), mainly due to their failure to capture the inherent temporal variations of video frames and to account for model behavior differences as the number of frames varies. To address these challenges, we introduce Vid-SME, the first membership inference method tailored for video data used in video understanding LLMs (VULLMs). Vid-SME leverages the confidence of model output and integrates adaptive parameterization to compute Sharma-Mittal entropy (SME) for video inputs. By leveraging the SME difference between natural and temporally-reversed video frames, Vid-SME derives robust membership scores to determine whether a given video is part of the model's training set. Experiments on various self-trained and open-sourced VULLMs demonstrate the strong effectiveness of Vid-SME.
Temporal Causal-based Simulation for Realistic Time-series Generation
Gkorgkolis, Nikolaos, Kougioulis, Nikolaos, Wang, MingXue, Caglayan, Bora, Tonon, Andrea, Simionato, Dario, Tsamardinos, Ioannis
Causal Discovery plays a pivotal role in revealing relationships among observed variables, particularly in the temporal setup. While the majority of CD methods rely on synthetic data for evaluation, and recently for training, these fall short in accurately mirroring real-world scenarios; an effect even more evident in temporal data. Generation techniques depending on simplified assumptions on causal structure, effects and time, limit the quality and diversity of the simulated data. In this work, we introduce Temporal Causal-based Simulation (TCS), a robust framework for generating realistic time-series data and their associated temporal causal graphs. The approach is structured in three phases: estimating the true lagged causal structure of the data, approximating the functional dependencies between variables and learning the noise distribution of the corresponding causal model, each part of which can be explicitly tailored based on data assumptions and characteristics. Through an extensive evaluation process, we highlight that single detection methods for generated data discrimination prove inadequate, accentuating it as a multifaceted challenge. For this, we detail a Min-max optimization phase that draws on AutoML techniques. Our contributions include a flexible, model-agnostic pipeline for generating realistic temporal causal data, a thorough evaluation setup which enhances the validity of the generated datasets and insights into the challenges posed by realistic data generation. Through experiments involving not only real but also semi-synthetic and purely synthetic datasets, we demonstrate that while sampling realistic causal data remains a complex task, our method enriches the domain of generating sensible causal-based temporal data.
On the Need to Align Intent and Implementation in Uncertainty Quantification for Machine Learning
Trivedi, Shubhendu, Nord, Brian D.
Quantifying uncertainties for machine learning (ML) models is a foundational challenge in modern data analysis. This challenge is compounded by at least two key aspects of the field: (a) inconsistent terminology surrounding uncertainty and estimation across disciplines, and (b) the varying technical requirements for establishing trustworthy uncertainties in diverse problem contexts. In this position paper, we aim to clarify the depth of these challenges by identifying these inconsistencies and articulating how different contexts impose distinct epistemic demands. We examine the current landscape of estimation targets (e.g., prediction, inference, simulation-based inference), uncertainty constructs (e.g., frequentist, Bayesian, fiducial), and the approaches used to map between them. Drawing on the literature, we highlight and explain examples of problematic mappings. To help address these issues, we advocate for standards that promote alignment between the \textit{intent} and \textit{implementation} of uncertainty quantification (UQ) approaches. We discuss several axes of trustworthiness that are necessary (if not sufficient) for reliable UQ in ML models, and show how these axes can inform the design and evaluation of uncertainty-aware ML systems. Our practical recommendations focus on scientific ML, offering illustrative cases and use scenarios, particularly in the context of simulation-based inference (SBI).
Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings
Zenati, Houssam, Bozkurt, Bariscan, Gretton, Arthur
Estimating the distribution of outcomes under counterfactual policies is critical for decision-making in domains such as recommendation, advertising, and healthcare. We analyze a novel framework-Counterfactual Policy Mean Embedding (CPME)-that represents the entire counterfactual outcome distribution in a reproducing kernel Hilbert space (RKHS), enabling flexible and nonparametric distributional off-policy evaluation. We introduce both a plug-in estimator and a doubly robust estimator; the latter enjoys improved uniform convergence rates by correcting for bias in both the outcome embedding and propensity models. Building on this, we develop a doubly robust kernel test statistic for hypothesis testing, which achieves asymptotic normality and thus enables computationally efficient testing and straightforward construction of confidence intervals. Our framework also supports sampling from the counterfactual distribution. Numerical simulations illustrate the practical benefits of CPME over existing methods.
AI-Driven Vehicle Condition Monitoring with Cell-Aware Edge Service Migration
Kalalas, Charalampos, Mulinka, Pavol, Belmonte, Guillermo Candela, Fornell, Miguel, Dalgitsis, Michail, Vera, Francisco Paredes, Sánchez, Javier Santaella, Villares, Carmen Vicente, Sedar, Roshan, Datsika, Eftychia, Antonopoulos, Angelos, Ojea, Antonio Fernández, Payaro, Miquel
Artificial intelligence (AI) has been increasingly applied to the condition monitoring of vehicular equipment, aiming to enhance maintenance strategies, reduce costs, and improve safety. Leveraging the edge computing paradigm, AI-based condition monitoring systems process vast streams of vehicular data to detect anomalies and optimize operational performance. In this work, we introduce a novel vehicle condition monitoring service that enables real-time diagnostics of a diverse set of anomalies while remaining practical for deployment in real-world edge environments. To address mobility challenges, we propose a closed-loop service orchestration framework where service migration across edge nodes is dynamically triggered by network-related metrics. Our approach has been implemented and tested in a real-world race circuit environment equipped with 5G network capabilities under diverse operational conditions. Experimental results demonstrate the effectiveness of our framework in ensuring low-latency AI inference and adaptive service placement, highlighting its potential for intelligent transportation and mobility applications.
Stereotypical gender actions can be extracted from Web text
Herdağdelen, Amaç, Baroni, Marco
Online social networks and micro-blogging services are no longer limited to the followers of the latest technologies or teenagers, as might once have been expected. Such technology and services are becoming widely adopted by the mainstream population as an integral part of their daily lives (Fox et al., 2009). A very prominent example of such an application is Twitter, a micro-blogging service. Twitter lets its users post very short (at most 140-character) messages - which are called tweets - about what they have been doing or thinking, or what they want to share with their friends and other people. Everyday, tens of millions of tweets are posted by users worldwide. The proliferation of publicly available, user-generated content is a vast source of social data and is already shaping the field of computational social science (Lazer et al., 2009; Thelwall et al., 2010a). Another field which enjoys the abundance of Web-based text is knowledge extraction and automated ontology building. An example application is KNEXT ( Kn owledge Ex traction from T ext) - a system proposed for extracting "general world knowledge from miscellaneous texts, including fiction" (Schubert and Tong, 2003). Web-based text is increasingly used as a source for everyday knowledge (frequently referred as commonsense knowledge).