Performance Analysis
GCMCG: A Clustering-Aware Graph Attention and Expert Fusion Network for Multi-Paradigm, Multi-task, and Cross-Subject EEG Decoding
Chen, Yiqiao, Huang, Zijian, He, Juchi, Xu, Fazheng, Feng, Zhenghui
Brain-Computer Interfaces (BCIs) based on Motor Execution (ME) and Motor Imagery (MI) electroencephalogram (EEG) signals offer a direct pathway for human-machine interaction. However, developing robust decoding models remains challenging due to the complex spatio-temporal dynamics of EEG, its low signal-to-noise ratio, and the limited generalizability of many existing approaches across subjects and paradigms. To address these issues, this paper proposes Graph-guided Clustering Mixture-of-Experts CNN-GRU (GCMCG), a novel unified framework for MI-ME EEG decoding. Our approach integrates a robust preprocessing stage using Independent Component Analysis and Wavelet Transform (ICA-WT) for effective denoising. We further introduce a pre-trainable graph tokenization module that dynamically models electrode relationships via a Graph Attention Network (GAT), followed by unsupervised spectral clustering to decompose signals into interpretable functional brain regions. Each region is processed by a dedicated CNN-GRU expert network, and a gated fusion mechanism with L1 regularization adaptively combines these local features with a global expert. This Mixture-of-Experts (MoE) design enables deep spatio-temporal fusion and enhances representational capacity. A three-stage training strategy incorporating focal loss and progressive sampling is employed to improve cross-subject generalization and handle class imbalance. Evaluated on three public datasets of varying complexity (EEGmmidb-BCI2000, BCI-IV 2a, and M3CV), GCMCG achieves overall accuracies of 86.60%, 98.57%, and 99.61%, respectively, which demonstrates its superior effectiveness and strong generalization capability for practical BCI applications.
Pushing the Boundaries of Interpretability: Incremental Enhancements to the Explainable Boosting Machine
Liyanage, Isara, Thayasivam, Uthayasanker
Abstract--The widespread adoption of complex machine learning models in high-stakes domains has brought the "black-box" problem to the forefront of responsible AI research. This paper aims at addressing this issue by improving the Explainable Boosting Machine (EBM), a state-of-the-art glassbox model that delivers both high accuracy and complete transparency. The paper outlines three distinct enhancement methodologies: targeted hyperparameter optimization with Bayesian methods, the implementation of a custom multi-objective function for fairness for hyperparameter optimization, and a novel self-supervised pre-training pipeline for cold-start scenarios. All three methodologies are evaluated across standard benchmark datasets, including the Adult Income, Credit Card Fraud Detection, and UCI Heart Disease datasets. The analysis indicates that while the tuning process yielded marginal improvements in the primary ROC AUC metric, it led to a subtle but important shift in the model's decision-making behavior, demonstrating the value of a multi-faceted evaluation beyond a single performance score. This work is positioned as a critical step toward developing machine learning systems that are not only accurate but also robust, equitable, and transparent, meeting the growing demands of regulatory and ethical compliance. A. The Black-Box Problem in High-Stakes Domains The remarkable surge in the performance of machine learning models has led to their pervasive adoption across a multitude of domains, from retail and finance to medicine and judicial systems. Complex, high-performing models, such as deep neural networks and ensemble methods like Random Forest and XGBoost, have become the de facto standard for many tasks.
FairMT: Fairness for Heterogeneous Multi-Task Learning
Hu, Guanyu, Lian, Tangzheng, Yan, Na, Kollias, Dimitrios, Yang, Xinyu, Celiktutan, Oya, Song, Siyang, Fu, Zeyu
Fairness in machine learning has been extensively studied in single-task settings, while fair multi-task learning (MTL), especially with heterogeneous tasks (classification, detection, regression) and partially missing labels, remains largely unexplored. Existing fairness methods are predominantly classification-oriented and fail to extend to continuous outputs, making a unified fairness objective difficult to formulate. Further, existing MTL optimization is structurally misaligned with fairness: constraining only the shared representation, allowing task heads to absorb bias and leading to uncontrolled task-specific disparities. Finally, most work treats fairness as a zero-sum trade-off with utility, enforcing symmetric constraints that achieve parity by degrading well-served groups. We introduce FairMT, a unified fairness-aware MTL framework that accommodates all three task types under incomplete supervision. At its core is an Asymmetric Heterogeneous Fairness Constraint Aggregation mechanism, which consolidates task-dependent asymmetric violations into a unified fairness constraint. Utility and fairness are jointly optimized via a primal--dual formulation, while a head-aware multi-objective optimization proxy provides a tractable descent geometry that explicitly accounts for head-induced anisotropy. Across three homogeneous and heterogeneous MTL benchmarks encompassing diverse modalities and supervision regimes, FairMT consistently achieves substantial fairness gains while maintaining superior task utility. Code will be released upon paper acceptance.
RECTor: Robust and Efficient Correlation Attack on Tor
Wu, Binghui, Divakaran, Dinil Mon, Csikor, Levente, Gurusamy, Mohan
Tor is a widely used anonymity network that conceals user identities by routing traffic through encrypted relays, yet it remains vulnerable to traffic correlation attacks that deanonymize users by matching patterns in ingress and egress traffic. However, existing correlation methods suffer from two major limitations: limited robustness to noise and partial observations, and poor scalability due to computationally expensive pairwise matching. To address these challenges, we propose RECTor, a machine learning-based framework for traffic correlation under realistic conditions. RECTor employs attention-based Multiple Instance Learning (MIL) and GRU-based temporal encoding to extract robust flow representations, even when traffic data is incomplete or obfuscated. These embeddings are mapped into a shared space via a Siamese network and efficiently matched using approximate nearest neighbor (aNN) search. Empirical evaluations show that RECTor outperforms state-of-the-art baselines such as DeepCorr, DeepCOFFEA, and FlowTracker, achieving up to 60% higher true positive rates under high-noise conditions and reducing training and inference time by over 50%. Moreover, RECTor demonstrates strong scalability: inference cost grows near-linearly as the number of flows increases. These findings reveal critical vulnerabilities in Tor's anonymity model and highlight the need for advanced model-aware defenses.
VCWorld: A Biological World Model for Virtual Cell Simulation
Wei, Zhijian, Ma, Runze, Wang, Zichen, Li, Zhongmin, Song, Shuotong, Zheng, Shuangjia
Virtual cell modeling aims to predict cellular responses to perturbations. Existing virtual cell models rely heavily on large-scale single-cell datasets, learning explicit mappings between gene expression and perturbations. Although recent models attempt to incorporate multi-source biological information, their generalization remains constrained by data quality, coverage, and batch effects. More critically, these models often function as black boxes, offering predictions without interpretability or consistency with biological principles, which undermines their credibility in scientific research. To address these challenges, we present VCWorld, a cell-level white-box simulator that integrates structured biological knowledge with the iterative reasoning capabilities of large language models to instantiate a biological world model. VCWorld operates in a data-efficient manner to reproduce perturbation-induced signaling cascades and generates interpretable, stepwise predictions alongside explicit mechanistic hypotheses. In drug perturbation benchmarks, VCWorld achieves state-of-the-art predictive performance, and the inferred mechanistic pathways are consistent with publicly available biological evidence.
Teleportation-Based Defenses for Privacy in Approximate Machine Unlearning
Maheri, Mohammad M, Cadet, Xavier, Chin, Peter, Haddadi, Hamed
Approximate machine unlearning aims to efficiently remove the influence of specific data points from a trained model, offering a practical alternative to full retraining. However, it introduces privacy risks: an adversary with access to pre- and post-unlearning models can exploit their differences for membership inference or data reconstruction. We show these vulnerabilities arise from two factors: large gradient norms of forget-set samples and the close proximity of unlearned parameters to the original model. To demonstrate their severity, we propose unlearning-specific membership inference and reconstruction attacks, showing that several state-of-the-art methods (e.g., NGP, SCRUB) remain vulnerable. To mitigate this leakage, we introduce WARP, a plug-and-play teleportation defense that leverages neural network symmetries to reduce forget-set gradient energy and increase parameter dispersion while preserving predictions. This reparameterization obfuscates the signal of forgotten data, making it harder for attackers to distinguish forgotten samples from non-members or recover them via reconstruction. Across six unlearning algorithms, our approach achieves consistent privacy gains, reducing adversarial advantage (AUC) by up to 64% in black-box and 92% in white-box settings, while maintaining accuracy on retained data. These results highlight teleportation as a general tool for reducing attack success in approximate unlearning.
Comparative Evaluation of Generative AI Models for Chest Radiograph Report Generation in the Emergency Department
Lim, Woo Hyeon, Lee, Ji Young, Lee, Jong Hyuk, Kim, Saehoon, Kim, Hyungjin
Purpose: To benchmark open-source or commercial medical image-specific VLMs against real-world radiologist-written reports. Methods: This retrospective study included adult patients who presented to the emergency department between January 2022 and April 2025 and underwent same-day CXR and CT for febrile or respiratory symptoms. Reports from five VLMs (AIRead, Lingshu, MAIRA-2, MedGemma, and MedVersa) and radiologist-written reports were randomly presented and blindly evaluated by three thoracic radiologists using four criteria: RADPEER, clinical acceptability, hallucination, and language clarity. Comparative performance was assessed using generalized linear mixed models, with radiologist-written reports treated as the reference. Finding-level analyses were also performed with CT as the reference. Results: A total of 478 patients (median age, 67 years [interquartile range, 50-78]; 282 men [59.0%]) were included. AIRead demonstrated the lowest RADPEER 3b rate (5.3% [76/1434] vs. radiologists 13.9% [200/1434]; P<.001), whereas other VLMs showed higher disagreement rates (16.8-43.0%; P<.05). Clinical acceptability was the highest with AIRead (84.5% [1212/1434] vs. radiologists 74.3% [1065/1434]; P<.001), while other VLMs performed worse (41.1-71.4%; P<.05). Hallucinations were rare with AIRead, comparable to radiologists (0.3% [4/1425]) vs. 0.1% [1/1425]; P=.21), but frequent with other models (5.4-17.4%; P<.05). Language clarity was higher with AIRead (82.9% [1189/1434]), Lingshu (88.0% [1262/1434]), and MedVersa (88.4% [1268/1434]) compared with radiologists (78.1% [1120/1434]; P<.05). Sensitivity varied substantially across VLMs for the common findings: AIRead, 15.5-86.7%; Lingshu, 2.4-86.7%; MAIRA-2, 6.0-72.0%; MedGemma, 4.8-76.7%; and MedVersa, 20.2-69.3%. Conclusion: Medical VLMs for CXR report generation exhibited variable performance in report quality and diagnostic measures.
SD-CGAN: Conditional Sinkhorn Divergence GAN for DDoS Anomaly Detection in IoT Networks
Onyeka, Henry, Samson, Emmanuel, Hong, Liang, Islam, Tariqul, Ahmed, Imtiaz, Hasan, Kamrul
Abstract--The increasing complexity of IoT edge networks presents significant challenges for anomaly detection, particularly in identifying sophisticated Denial-of-Service (DoS) attacks and zero-day exploits under highly dynamic and imbalanced traffic conditions. This paper proposes SD-CGAN, a Conditional Generative Adversarial Network framework enhanced with Sinkhorn Divergence, tailored for robust anomaly detection in IoT edge environments. The framework incorporates CTGAN-based synthetic data augmentation to address class imbalance and leverages Sinkhorn Divergence as a geometry-aware loss function to improve training stability and reduce mode collapse. The model is evaluated on exploitative attack subsets from the CICDDoS2019 dataset and compared against baseline deep learning and GAN-based approaches. Results show that SD-CGAN achieves superior detection accuracy, precision, recall, and F1-score while maintaining computational efficiency suitable for deployment in edge-enabled IoT environments. The evolution of IoT edge networks has enabled ultra-low latency applications such as autonomous vehicles, industrial automation, and mission-critical connected systems.
A Rosetta Stone for AI Benchmarks
Ho, Anson, Denain, Jean-Stanislas, Atanasov, David, Albanie, Samuel, Shah, Rohin
Most AI benchmarks saturate within years or even months after they are introduced, making it hard to study long-run trends in AI capabilities. To address this challenge, we build a statistical framework that stitches benchmarks together, putting model capabilities and benchmark difficulties on a single numerical scale. This acts as a "Rosetta Stone", allowing us to compare models across a wide range of abilities and time, even if they are not evaluated on the same benchmarks. Moreover, this works without assuming how capabilities evolve across time or with training compute. We demonstrate three applications of this framework. First, we use it to measure the speed of AI progress over time, and to forecast future AI capabilities. Second, we estimate the rate of improvements in algorithmic efficiency, finding estimates that are higher, but broadly consistent with prior work. Finally, we find that our approach can be used to detect rapid accelerations in AI progress.
Analysis of Incursive Breast Cancer in Mammograms Using YOLO, Explainability, and Domain Adaptation
Adhikari, Jayan, Joshi, Prativa, Baral, Susish
Abstract--Deep learning models for breast cancer detection from mammographic images have significant reliability problems when presented with Out-of-Distribution (OOD) inputs such as other imaging modalities (CT, MRI, X-ray) or equipment variations, leading to unreliable detection and misdiagnosis. Our strategy establishes an in-domain gallery via cosine similarity to rigidly reject non-mammographic inputs prior to processing, ensuring that only domain-associated images supply the detection pipeline. The OOD detection component achieves 99.77% general accuracy with immaculate 100% accuracy on OOD test sets, effectively eliminating irrelevant imaging modalities. ResNet50 was selected as the optimum backbone after 12 CNN architecture searches. The joint framework unites OOD robustness with high detection performance (mAP@0.5: Experimental validation establishes that OOD filtering significantly improves system reliability by preventing false alarms on out-of-distribution inputs while maintaining higher detection accuracy on mammographic data. The present study offers a fundamental foundation for the deployment of reliable AI-based breast cancer detection systems in diverse clinical environments with inherent data heterogeneity. A global health concern, breast cancer is the second-highest cause of cancer related to mortality in women. It has been recorded as the most diagnosed disease in the world in 2020 [1]. According to the World Health Organization, all types of cancer account for 626700 global deaths of women, out of which the breast is the predominant and second leading cause [2]. If diagnosed in its early development stage, the survival rate are likely to be high and the treatment cost will get reduced [3]. Studies has found that 30% breast cancer are diagnosed when the size of the mass is 30mm.