Goto

Collaborating Authors

 Performance Analysis


CGBench: Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research

arXiv.org Artificial Intelligence

Variant and gene interpretation are fundamental to personalized medicine and translational biomedicine. However, traditional approaches are manual and labor-intensive. Generative language models (LMs) can facilitate this process, accelerating the translation of fundamental research into clinically-actionable insights. While existing benchmarks have attempted to quantify the capabilities of LMs for interpreting scientific data, these studies focus on narrow tasks that do not translate to real-world research. To meet these challenges, we introduce CGBench, a robust benchmark that tests reasoning capabilities of LMs on scientific publications. CGBench is built from ClinGen, a resource of expert-curated literature interpretations in clinical genetics. CGBench measures the ability to 1) extract relevant experimental results following precise protocols and guidelines, 2) judge the strength of evidence, and 3) categorize and describe the relevant outcome of experiments. We test 8 different LMs and find that while models show promise, substantial gaps exist in literature interpretation, especially on fine-grained instructions. Reasoning models excel in fine-grained tasks but non-reasoning models are better at high-level interpretations. Finally, we measure LM explanations against human explanations with an LM judge approach, revealing that models often hallucinate or misinterpret results even when correctly classifying evidence. CGBench reveals strengths and weaknesses of LMs for precise interpretation of scientific publications, opening avenues for future research in AI for clinical genetics and science more broadly.


Combining Euclidean and Hyperbolic Representations for Node-level Anomaly Detection

arXiv.org Artificial Intelligence

Node-level anomaly detection (NAD) is challenging due to diverse structural patterns and feature distributions. As such, NAD is a critical task with several applications which range from fraud detection, cybersecurity, to recommendation systems. We introduce Janus, a framework that jointly leverages Euclidean and Hyperbolic Graph Neural Networks to capture complementary aspects of node representations. Each node is described by two views, composed by the original features and structural features derived from random walks and degrees, then embedded into Euclidean and Hyperbolic spaces. A multi Graph-Autoencoder framework, equipped with a contrastive learning objective as regularization term, aligns the embeddings across the Euclidean and Hyperbolic spaces, highlighting nodes whose views are difficult to reconcile and are thus likely anomalous. Experiments on four real-world datasets show that Janus consistently outperforms shallow and deep baselines, empirically demonstrating that combining multiple geometric representations provides a robust and effective approach for identifying subtle and complex anomalies in graphs.


Think as a Doctor: An Interpretable AI Approach for ICU Mortality Prediction

arXiv.org Artificial Intelligence

Intensive Care Unit (ICU) mortality prediction, which estimates a patient's mortality status at discharge using EHRs collected early in an ICU admission, is vital in critical care. For this task, predictive accuracy alone is insufficient; interpretability is equally essential for building clinical trust and meeting regulatory standards, a topic that has attracted significant attention in information system research. Accordingly, an ideal solution should enable intrinsic interpretability and align its reasoning with three key elements of the ICU decision-making practices: clinical course identification, demographic heterogeneity, and prognostication awareness. However, conventional approaches largely focus on demographic heterogeneity, overlooking clinical course identification and prognostication awareness. Recent prototype learning methods address clinical course identification, yet the integration of the other elements into such frameworks remains underexplored. To address these gaps, we propose ProtoDoctor, a novel ICU mortality prediction framework that delivers intrinsic interpretability while integrating all three elements of the ICU decision-making practices into its reasoning process. Methodologically, ProtoDoctor features two key innovations: the Prognostic Clinical Course Identification module and the Demographic Heterogeneity Recognition module. The former enables the identification of clinical courses via prototype learning and achieves prognostication awareness using a novel regularization mechanism. The latter models demographic heterogeneity through cohort-specific prototypes and risk adjustments. Extensive empirical evaluations demonstrate that ProtoDoctor outperforms state-of-the-art baselines in predictive accuracy. Human evaluations further confirm that its interpretations are more clinically meaningful, trustworthy, and applicable in ICU practice.


Quantum Kernel Methods: Convergence Theory, Separation Bounds and Applications to Marketing Analytics

arXiv.org Artificial Intelligence

Quantum machine learning (QML) has emerged as one of the most promising near-term applications of quantum computing, with quantum kernel methods representing a particularly elegant bridge between classical machine learning theory and quantum computational advantages [17, 8, 18]. The fundamental insight underlying quantum kernels is that quantum circuits can efficiently compute inner products in exponentially large Hilbert spaces, potentially capturing data relationships that are intractable for classical methods [12]. This study presents an end-to-end feasibility test of a quantum-enhanced method for supervised classification on a real consumer dataset, evaluated with ROC analysis. We examine whether shallow, NISQ-compatible quantum embeddings and kernels can improve class separability and enable threshold-based operation along the ROC curve without retraining. We report 0.7790 accuracy, 0.7647 precision, 0.8609 recall, 0.8100 F1, and 0.83 AUC. Recent work has demonstrated empirical success of quantum support vector machines (Q-SVM) in various domains, from particle physics [21] to bioinformatics [19]. However, the theoretical foundations explaining when and why quantum advantage emerges remain incomplete. While pioneering studies by Schuld and Killoran [17] established the mathematical equivalence between variational quantum circuits and kernel methods, and Liu et al. [12] proved unconditional quantum advantage for specifically constructed problems, several fundamental questions remain open: 1. What are the convergence guarantees for variational quantum kernel optimization? 1


GTCN-G: A Residual Graph-Temporal Fusion Network for Imbalanced Intrusion Detection (Preprint)

arXiv.org Artificial Intelligence

The escalating complexity of network threats and the inherent class imbalance in traffic data present formidable challenges for modern Intrusion Detection Systems (IDS). While Graph Neural Networks (GNNs) excel in modeling topological structures and Temporal Convolutional Networks (TCNs) are proficient in capturing time-series dependencies, a framework that synergistically integrates both while explicitly addressing data imbalance remains an open challenge. This paper introduces a novel deep learning framework, named Gated Temporal Convolutional Network and Graph (GTCN-G), engineered to overcome these limitations. Our model uniquely fuses a Gated TCN (G-TCN) for extracting hierarchical temporal features from network flows with a Graph Convolutional Network (GCN) designed to learn from the underlying graph structure. The core innovation lies in the integration of a residual learning mechanism, implemented via a Graph Attention Network (GAT). This mechanism preserves original feature information through residual connections, which is critical for mitigating the class imbalance problem and enhancing detection sensitivity for rare malicious activities (minority classes). We conducted extensive experiments on two public benchmark datasets, UNSW-NB15 and ToN-IoT, to validate our approach. The empirical results demonstrate that the proposed GTCN-G model achieves state-of-the-art performance, significantly outperforming existing baseline models in both binary and multi-class classification tasks.


Fused Lasso Improves Accuracy of Co-occurrence Network Inference in Grouped Samples

arXiv.org Artificial Intelligence

Co-occurrence network inference algorithms have significantly advanced our understanding of microbiome communities. However, these algorithms typically analyze microbial associations within samples collected from a single environmental niche, often capturing only static snapshots rather than dynamic microbial processes. Previous studies have commonly grouped samples from different environmental niches together without fully considering how microbial communities adapt their associations when faced with varying ecological conditions. Our study addresses this limitation by explicitly investigating both spatial and temporal dynamics of microbial communities. We analyzed publicly available microbiome abundance data across multiple locations and time points, to evaluate algorithm performance in predicting microbial associations using our proposed Same-All Cross-validation (SAC) framework. SAC evaluates algorithms in two distinct scenarios: training and testing within the same environmental niche (Same), and training and testing on combined data from multiple environmental niches (All). To overcome the limitations of conventional algorithms, we propose fuser, an algorithm that, while not entirely new in machine learning, is novel for microbiome community network inference. It retains subsample-specific signals while simultaneously sharing relevant information across environments during training. Unlike standard approaches that infer a single generalized network from combined data, fuser generates distinct, environment-specific predictive networks. Our results demonstrate that fuser achieves comparable predictive performance to existing algorithms such as glmnet when evaluated within homogeneous environments (Same), and notably reduces test error compared to baseline algorithms in cross-environment (All) scenarios.


Pattern-based Knowledge Component Extraction from Student Code Using Representation Learning

arXiv.org Artificial Intelligence

Effective personalized learning in computer science education depends on accurately modeling what students know and what they need to learn. While Knowledge Components (KCs) provide a foundation for such modeling, automated KC extraction from student code is inherently challenging due to insufficient explainability of discovered KCs and the open-endedness of programming problems with significant structural variability across student solutions and complex interactions among programming concepts. In this work, we propose a novel, explainable framework for automated KC discovery through pattern-based KCs: recurring structural patterns within student code that capture the specific programming patterns and language constructs that students must master. Toward this, we train a Variational Autoencoder to generate important representative patterns from student code guided by an explainable, attention-based code representation model that identifies important correct and incorrect pattern implementations from student code. These patterns are then clustered to form pattern-based KCs. We evaluate our KCs using two well-established methods informed by Cognitive Science: learning curve analysis and Deep Knowledge Tracing (DKT). Experimental results demonstrate meaningful learning trajectories and significant improvements in DKT predictive performance over traditional KT methods. This work advances knowledge modeling in CS education by providing an automated, scalable, and explainable framework for identifying granular code patterns and algorithmic constructs, essential for student learning.


Chain-of-Influence: Tracing Interdependencies Across Time and Features in Clinical Predictive Modelings

arXiv.org Machine Learning

Modeling clinical time-series data is hampered by the challenge of capturing latent, time-varying dependencies among features. State-of-the-art approaches often rely on black-box mechanisms or simple aggregation, failing to explicitly model how the influence of one clinical variable propagates through others over time. We propose $\textbf{Chain-of-Influence (CoI)}$, an interpretable deep learning framework that constructs an explicit, time-unfolded graph of feature interactions. CoI leverages a multi-level attention architecture: first, a temporal attention layer identifies critical time points in a patient's record; second, a cross-feature attention layer models the directed influence from features at these time points to subsequent features. This design enables the tracing of influence pathways, providing a granular audit trail that shows how any feature at any time contributes to the final prediction, both directly and through its influence on other variables. We evaluate CoI on mortality and disease progression tasks using the MIMIC-IV dataset and a private chronic kidney disease cohort. Our framework significantly outperforms existing methods in predictive accuracy. More importantly, through case studies, we show that CoI can uncover clinically meaningful, patient-specific patterns of disease progression that are opaque to other models, offering unprecedented transparency into the temporal and cross-feature dependencies that inform clinical decision-making.


Kernel Treatment Effects with Adaptively Collected Data

arXiv.org Machine Learning

Adaptive experiments improve efficiency by adjusting treatment assignments based on past outcomes, but this adaptivity breaks the i.i.d. assumptions that underpins classical asymptotics. At the same time, many questions of interest are distributional, extending beyond average effects. Kernel treatment effects (KTE) provide a flexible framework by representing counterfactual outcome distributions in an RKHS and comparing them via kernel distances. We present the first kernel-based framework for distributional inference under adaptive data collection. Our method combines doubly robust scores with variance stabilization to ensure asymptotic normality via a Hilbert-space martingale CLT, and introduces a sample-fitted stabilized test with valid type-I error. Experiments show it is well calibrated and effective for both mean shifts and higher-moment differences, outperforming adaptive baselines limited to scalar effects.


Reconstructing 12-Lead ECG from 3-Lead ECG using Variational Autoencoder to Improve Cardiac Disease Detection of Wearable ECG Devices

arXiv.org Artificial Intelligence

Twelve-lead electrocardiograms (ECGs) are the clinical gold standard for cardiac diagnosis, providing comprehensive spatial coverage of the heart necessary to detect conditions such as myocardial infarction (MI). However, their lack of portability limits continuous and large-scale use. Three-lead ECG systems are widely used in wearable devices due to their simplicity and mobility, but they often fail to capture pathologies in unmeasured regions. To address this, we propose WearECG, a Variational Autoencoder (VAE) method that reconstructs twelve-lead ECGs from three leads: II, V1, and V5. Our model includes architectural improvements to better capture temporal and spatial dependencies in ECG signals. We evaluate generation quality using MSE, MAE, and Frechet Inception Distance (FID), and assess clinical validity via a Turing test with expert cardiologists. To further validate diagnostic utility, we fine-tune ECGFounder, a large-scale pretrained ECG model, on a multi-label classification task involving over 40 cardiac conditions, including six different myocardial infarction locations, using both real and generated signals. Experiments on the MIMIC dataset show that our method produces physiologically realistic and diagnostically informative signals, with robust performance in downstream tasks. This work demonstrates the potential of generative modeling for ECG reconstruction and its implications for scalable, low-cost cardiac screening.