Performance Analysis
Uncertainty Reasoning with Photonic Bayesian Machines
Brückerhoff-Plückelmann, F., Borras, H., Hulyal, S. U., Meyer, L., Ji, X., Hu, J., Sun, J., Klein, B., Ebert, F., Dijkstra, J., McRae, L., Schmidt, P., Kippenberg, T. J., Fröning, H., Pernice, W.
Artificial intelligence (AI) systems increasingly influence safety-critical aspects of society, from medical diagnosis to autonomous mobility, making uncertainty awareness a central requirement for trustworthy AI. We present a photonic Bayesian machine that leverages the inherent randomness of chaotic light sources to enable uncertainty reasoning within the framework of Bayesian Neural Networks. The analog processor features a 1.28 Tbit/s digital interface compatible with PyTorch, enabling probabilistic convolutions processing within 37.5 ps per convolution. We use the system for simultaneous classification and out-of-domain detection of blood cell microscope images and demonstrate reasoning between aleatoric and epistemic uncertainties. The photonic Bayesian machine removes the bottleneck of pseudo random number generation in digital systems, minimizes the cost of sampling for probabilistic models, and thus enables high-speed trustworthy AI systems.
CLEF: Clinically-Guided Contrastive Learning for Electrocardiogram Foundation Models
Shu, Yuxuan, Charlton, Peter H., Kawsar, Fahim, Hernesniemi, Jussi, Malekzadeh, Mohammad
The electrocardiogram (ECG) is a key diagnostic tool in cardiovascular health. Single-lead ECG recording is integrated into both clinical-grade and consumer wearables. While self-supervised pretraining of foundation models on unlabeled ECGs improves diagnostic performance, existing approaches do not incorporate domain knowledge from clinical metadata. We introduce a novel contrastive learning approach that utilizes an established clinical risk score to adaptively weight negative pairs: clinically-guided contrastive learning. It aligns the similarities of ECG embeddings with clinically meaningful differences between subjects, with an explicit mechanism to handle missing metadata. On 12-lead ECGs from 161K patients in the MIMIC-IV dataset, we pretrain single-lead ECG foundation models at three scales, collectively called CLEF, using only routinely collected metadata without requiring per-sample ECG annotations. We evaluate CLEF on 18 clinical classification and regression tasks across 7 held-out datasets, and benchmark against 5 foundation model baselines and 3 self-supervised algorithms. When pretrained on 12-lead ECG data and tested on lead-I data, CLEF outperforms self-supervised foundation model baselines: the medium-sized CLEF achieves average AUROC improvements of at least 2.6% in classification and average reductions in MAEs of at least 3.2% in regression. Comparing with existing self-supervised learning algorithms, CLEF improves the average AUROC by at least 1.8%. Moreover, when pretrained only on lead-I data for classification tasks, CLEF performs comparably to the state-of-the-art ECGFounder, which was trained in a supervised manner. Overall, CLEF enables more accurate and scalable single-lead ECG analysis, advancing remote health monitoring. Code and pretrained CLEF models are available at: github.com/Nokia-Bell-Labs/ecg-foundation-model.
Opening the Black Box: Nowcasting Singapore's GDP Growth and its Explainability
Timely assessment of current conditions is essential especially for small, open economies such as Singapore, where external shocks transmit rapidly to domestic activity. We develop a real-time nowcasting framework for quarterly GDP growth using a high-dimensional panel of approximately 70 indicators, encompassing economic and financial indicators over 1990Q1-2023Q2. The analysis covers penalized regressions, dimensionality-reduction methods, ensemble learning algorithms, and neural architectures, benchmarked against a Random Walk, an AR(3), and a Dynamic Factor Model. The pipeline preserves temporal ordering through an expanding-window walk-forward design with Bayesian hyperparameter optimization, and uses moving block-bootstrap procedures both to construct prediction intervals and to obtain confidence bands for feature-importance measures. It adopts model-specific and XAI-based explainability tools. A Model Confidence Set procedure identifies statistically superior learners, which are then combined through simple, weighted, and exponentially weighted schemes; the resulting time-varying weights provide an interpretable representation of model contributions. Predictive ability is assessed via Giacomini-White tests. Empirical results show that penalized regressions, dimensionality-reduction models, and GRU networks consistently outperform all benchmarks, with RMSFE reductions of roughly 40-60%; aggregation delivers further gains. Feature-attribution methods highlight industrial production, external trade, and labor-market indicators as dominant drivers of Singapore's short-run growth dynamics.
Large Language Model based Smart Contract Auditing with LLMBugScanner
Yuan, Yining, Wang, Yifei, Xu, Yichang, Yahn, Zachary, Hu, Sihao, Liu, Ling
This paper presents LLMBugScanner, a large language model (LLM) based framework for smart contract vulnerability detection using fine-tuning and ensemble learning. Smart contract auditing presents several challenges for LLMs: different pretrained models exhibit varying reasoning abilities, and no single model performs consistently well across all vulnerability types or contract structures. These limitations persist even after fine-tuning individual LLMs. To address these challenges, LLMBugScanner combines domain knowledge adaptation with ensemble reasoning to improve robustness and generalization. Through domain knowledge adaptation, we fine-tune LLMs on complementary datasets to capture both general code semantics and instruction-guided vulnerability reasoning, using parameter-efficient tuning to reduce computational cost. Through ensemble reasoning, we leverage the complementary strengths of multiple LLMs and apply a consensus-based conflict resolution strategy to produce more reliable vulnerability assessments. We conduct extensive experiments across multiple popular LLMs and compare LLMBugScanner with both pretrained and fine-tuned individual models. Results show that LLMBugScanner achieves consistent accuracy improvements and stronger generalization, demonstrating that it provides a principled, cost-effective, and extensible framework for smart contract auditing.
Leveraging AI multimodal geospatial foundation models for improved near-real-time flood mapping at a global scale
Tulbure, Mirela G., Caineta, Julio, Broich, Mark, Gaines, Mollie D., Rufin, Philippe, Thomas, Leon-Friedrich, Alemohammad, Hamed, Hemmerling, Jan, Hostert, Patrick
Floods are among the most damaging weather-related hazards, and in 2024, the warmest year on record, extreme flood events affected communities across five continents. Earth observation (EO) satellites provide critical, frequent coverage for mapping inundation, yet operational accuracy depends heavily on labeled datasets and model generalization. Recent Geospatial Foundation Models (GFMs), such as ESA-IBM's TerraMind, offer improved generalizability through large-scale self-supervised pretraining, but their performance on diverse global flood events remains poorly understood. We fine-tune TerraMind for flood extent mapping using FloodsNet, a harmonized multimodal dataset containing co-located Sentinel-1 (Synthetic Aperture Radar, SAR data) and Sentinel-2 (optical) imagery for 85 flood events worldwide. We tested four configurations (base vs. large models; frozen vs. unfrozen backbones) and compared against the TerraMind Sen1Floods11 example and a U-Net trained on both FloodsNet and Sen1Floods11. The base-unfrozen configuration provided the best balance of accuracy, precision, and recall at substantially lower computational cost than the large model. The large unfrozen model achieved the highest recall. Models trained on FloodsNet outperformed the Sen1Floods11-trained example in recall with similar overall accuracy. U-Net achieved higher recall than all GFM configurations, though with slightly lower accuracy and precision. Our results demonstrate that integrating multimodal optical and SAR data and fine-tuning a GFM can enhance near-real-time flood mapping. This study provides one of the first global-scale evaluations of a GFM for flood segmentation, highlighting both its potential and current limitations for climate adaptation and disaster resilience.
Contextual Gating within the Transformer Stack: Synergistic Feature Modulation for Enhanced Lyrical Classification and Calibration
This study introduces a significant architectural advancement in feature fusion for lyrical content classification by integrating auxiliary structural features directly into the self-attention mechanism of a pre-trained Transformer. I propose the SFL Transformer, a novel deep learning model that utilizes a Contextual Gating mechanism (an Intermediate SFL) to modulate the sequence of hidden states within the BERT encoder stack, rather than fusing features at the final output layer. This approach modulates the deep, contextualized semantic features (Hseq) using low-dimensional structural cues (Fstruct). The model is applied to a challenging binary classification task derived from UMAP-reduced lyrical embeddings. The SFL Transformer achieved an Accuracy of 0.9910 and a Macro F1 score of 0.9910, significantly improving the state-of-the-art established by the previously published SFL model (Accuracy 0.9894). Crucially, this Contextual Gating strategy maintained exceptional reliability, with a low Expected Calibration Error (ECE = 0.0081) and Log Loss (0.0489). This work validates the hypothesis that injecting auxiliary context mid-stack is the most effective means of synergistically combining structural and semantic information, creating a model with both superior discriminative power and high-fidelity probability estimates.
CONFIDE: Hallucination Assessment for Reliable Biomolecular Structure Prediction and Design
Gao, Zijun, He, Mutian, Sun, Shijia, Cao, Hanqun, Zhang, Jingjie, Luo, Zihao, Wang, Xiaorui, Yao, Xiaojun, Hsieh, Chang-Yu, Gu, Chunbin, Heng, Pheng Ann
Reliable evaluation of protein structure predictions remains challenging, as metrics like pLDDT capture energetic stability but often miss subtle errors such as atomic clashes or conformational traps reflecting topological frustration within the protein folding energy landscape. We present CODE (Chain of Diffusion Embeddings), a self evaluating metric empirically found to quantify topological frustration directly from the latent diffusion embeddings of the AlphaFold3 series of structure predictors in a fully unsupervised manner. Integrating this with pLDDT, we propose CONFIDE, a unified evaluation framework that combines energetic and topological perspectives to improve the reliability of AlphaFold3 and related models. CODE strongly correlates with protein folding rates driven by topological frustration, achieving a correlation of 0.82 compared to pLDDT's 0.33 (a relative improvement of 148\%). CONFIDE significantly enhances the reliability of quality evaluation in molecular glue structure prediction benchmarks, achieving a Spearman correlation of 0.73 with RMSD, compared to pLDDT's correlation of 0.42, a relative improvement of 73.8\%. Beyond quality assessment, our approach applies to diverse drug design tasks, including all-atom binder design, enzymatic active site mapping, mutation induced binding affinity prediction, nucleic acid aptamer screening, and flexible protein modeling. By combining data driven embeddings with theoretical insight, CODE and CONFIDE outperform existing metrics across a wide range of biomolecular systems, offering robust and versatile tools to refine structure predictions, advance structural biology, and accelerate drug discovery.
Seizure-NGCLNet: Representation Learning of SEEG Spatial Pathological Patterns for Epileptic Seizure Detection via Node-Graph Dual Contrastive Learning
Wang, Yiping, Wang, Peiren, Li, Zhenye, Liu, Fang, Huang, Jinguo
Complex spatial connectivity patterns, such as interictal suppression and ictal propagation, complicate accurate drug-resistant epilepsy (DRE) seizure detection using stereotactic electroencephalography (SEEG) and traditional machine learning methods. Two critical challenges remain:(1)a low signal-to-noise ratio in functional connectivity estimates, making it difficult to learn seizure-related interactions; and (2)expert labels for spatial pathological connectivity patterns are difficult to obtain, meanwhile lacking the patterns' representation to improve seizure detection. To address these issues, we propose a novel node-graph dual contrastive learning framework, Seizure-NGCLNet, to learn SEEG interictal suppression and ictal propagation patterns for detecting DRE seizures with high precision. First, an adaptive graph augmentation strategy guided by centrality metrics is developed to generate seizure-related brain networks. Second, a dual-contrastive learning approach is integrated, combining global graph-level contrast with local node-graph contrast, to encode both spatial structural and semantic epileptogenic features. Third, the pretrained embeddings are fine-tuned via a top-k localized graph attention network to perform the final classification. Extensive experiments on a large-scale public SEEG dataset from 33 DRE patients demonstrate that Seizure-NGCLNet achieves state-of-the-art performance, with an average accuracy of 95.93%, sensitivity of 96.25%, and specificity of 94.12%. Visualizations confirm that the learned embeddings clearly separate ictal from interictal states, reflecting suppression and propagation patterns that correspond to the clinical mechanisms. These results highlight Seizure-NGCLNet's ability to learn interpretable spatial pathological patterns, enhancing both seizure detection and seizure onset zone localization.
DySTAN: Joint Modeling of Sedentary Activity and Social Context from Smartphone Sensors
Sneh, Aditya, Sahu, Nilesh Kumar, Gupta, Snehil, Lone, Haroon R.
Accurately recognizing human context from smartphone sensor data remains a significant challenge, especially in sedentary settings where activities such as studying, attending lectures, relaxing, and eating exhibit highly similar inertial patterns. Furthermore, social context plays a critical role in understanding user behavior, yet is often overlooked in mobile sensing research. To address these gaps, we introduce LogMe, a mobile sensing application that passively collects smartphone sensor data (accelerometer, gyroscope, magnetometer, and rotation vector) and prompts users for hourly self-reports capturing both sedentary activity and social context. Using this dual-label dataset, we propose DySTAN (Dynamic Cross-Stitch with Task Attention Network), a multi-task learning framework that jointly classifies both context dimensions from shared sensor inputs. It integrates task-specific layers with cross-task attention to model subtle distinctions effectively. DySTAN improves sedentary activity macro F1 scores by 21.8% over a single-task CNN-BiLSTM-GRU (CBG) model and by 8.2% over the strongest multi-task baseline, Sluice Network (SN). These results demonstrate the importance of modeling multiple, co-occurring context dimensions to improve the accuracy and robustness of mobile context recognition.
Graphing the Truth: Structured Visualizations for Automated Hallucination Detection in LLMs
Large Language Models have rapidly advanced in their ability to interpret and generate natural language. In enterprise settings, they are frequently augmented with closed-source domain knowledge to deliver more contextually informed responses. However, operational constraints such as limited context windows and inconsistencies between pre-training data and supplied knowledge often lead to hallucinations, some of which appear highly credible and escape routine human review. Current mitigation strategies either depend on costly, large-scale gold-standard Q\&A curation or rely on secondary model verification, neither of which offers deterministic assurance. This paper introduces a framework that organizes proprietary knowledge and model-generated content into interactive visual knowledge graphs. The objective is to provide end users with a clear, intuitive view of potential hallucination zones by linking model assertions to underlying sources of truth and indicating confidence levels. Through this visual interface, users can diagnose inconsistencies, identify weak reasoning chains, and supply corrective feedback. The resulting human-in-the-loop workflow creates a structured feedback loop that can enhance model reliability and continuously improve response quality.