Xiao, Ran
Fusion of ECG Foundation Model Embeddings to Improve Early Detection of Acute Coronary Syndromes
Meng, Zeyuan, Panchumarthi, Lovely Yeswanth, Kataria, Saurabh, Fedorov, Alex, Zègre-Hemsey, Jessica, Hu, Xiao, Xiao, Ran
Acute Coronary Syndrome (ACS) is a life - threatening cardiovascular condition where early and accurate diagnosis is critical for effective treatment and improved patient outcomes. This study explores the use of ECG foundation models, specifically ST - MEM and ECG - FM, to enhance ACS risk assessment using prehospital ECG data collected in the ambulances . Both models leverage self - supervised learning (SSL), with ST - MEM using a reconstruction - based approach and ECG - FM employing contrastive learning, capt uring unique spatial and temporal ECG features. We evaluate the performance of these models individually and through a fusion approach, where their embeddings are combined for enhanced prediction. Results demonstrate that both foundation models outperform a baseline ResNet - 50 model, with the fusion - based approach achieving the highest perf ormance (AUROC: 0.843 0.006, AUCPR: 0.674 0.012). These findings highlight the potential of ECG foundation models for early ACS detection and motivate further exploration of advanced fusion strategies to maximize complementary feature utilization.
Continuous Cardiac Arrest Prediction in ICU using PPG Foundation Model
Kataria, Saurabh, Xiao, Ran, Ruchti, Timothy, Clark, Matthew, Lu, Jiaying, Lee, Randall J., Grunwell, Jocelyn, Hu, Xiao
Non-invasive patient monitoring for tracking and predicting adverse acute health events is an emerging area of research. We pursue in-hospital cardiac arrest (IHCA) prediction using only single-channel finger photoplethysmography (PPG) signals. Our proposed two-stage model Feature Extractor-Aggregator Network (FEAN) leverages powerful representations from pre-trained PPG foundation models (PPG-GPT of size up to 1 Billion) stacked with sequential classification models. We propose two FEAN variants ("1H", "FH") which use the latest one-hour and (max) 24-hour history to make decisions respectively. Our study is the first to present IHCA prediction results in ICU patients using only unimodal (continuous PPG signal) waveform deep representations. With our best model, we obtain an average of 0.79 AUROC over 24~h prediction window before CA event onset with our model peaking performance at 0.82 one hour before CA. We also provide a comprehensive analysis of our model through architectural tuning and PaCMAP visualization of patient health trajectory in latent space.
Baichuan-Omni-1.5 Technical Report
Li, Yadong, Liu, Jun, Zhang, Tao, Zhang, Tao, Chen, Song, Li, Tianpeng, Li, Zehuan, Liu, Lijun, Ming, Lingfeng, Dong, Guosheng, Pan, Da, Li, Chong, Fang, Yuanbo, Kuang, Dongdong, Wang, Mingrui, Zhu, Chenglin, Zhang, Youwei, Guo, Hongyu, Zhang, Fengyu, Wang, Yuran, Ding, Bowen, Song, Wei, Li, Xu, Huo, Yuqi, Liang, Zheng, Zhang, Shusen, Wu, Xin, Zhao, Shuai, Xiong, Linchu, Wu, Yozhen, Ye, Jiahui, Lu, Wenhao, Li, Bowen, Zhang, Yan, Zhou, Yaqi, Chen, Xin, Su, Lei, Zhang, Hongda, Chen, Fuzhong, Dong, Xuezhen, Nie, Na, Wu, Zhiying, Xiao, Bin, Li, Ting, Dang, Shunya, Zhang, Ping, Sun, Yijia, Wu, Jincheng, Yang, Jinjie, Lin, Xionghai, Ma, Zhi, Wu, Kegeng, li, Jia, Yang, Aiyuan, Liu, Hui, Zhang, Jianqiang, Chen, Xiaoxi, Ai, Guangwei, Zhang, Wentao, Chen, Yicong, Huang, Xiaoqin, Li, Kun, Luo, Wenjing, Duan, Yifei, Zhu, Lingling, Xiao, Ran, Su, Zhe, Pu, Jiani, Wang, Dian, Jia, Xu, Zhang, Tianyu, Ai, Mengyu, Wang, Mang, Qiao, Yujing, Zhang, Lei, Shen, Yanjun, Yang, Fan, Zhen, Miao, Zhou, Yijie, Chen, Mingyang, Li, Fei, Zhu, Chenzheng, Lu, Keer, Zhao, Yaqi, Liang, Hao, Li, Youquan, Qin, Yanzhao, Sun, Linzhuang, Xu, Jianhua, Sun, Haoze, Lin, Mingan, Zhou, Zenan, Chen, Weipeng
We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pipeline for multimodal data, obtaining about 500B high-quality data (text, audio, and vision). Second, an audio-tokenizer (Baichuan-Audio-Tokenizer) has been designed to capture both semantic and acoustic information from audio, enabling seamless integration and enhanced compatibility with MLLM. Lastly, we designed a multi-stage training strategy that progressively integrates multimodal alignment and multitask fine-tuning, ensuring effective synergy across all modalities. Baichuan-Omni-1.5 leads contemporary models (including GPT4o-mini and MiniCPM-o 2.6) in terms of comprehensive omni-modal capabilities. Notably, it achieves results comparable to leading models such as Qwen2-VL-72B across various multimodal medical benchmarks.
SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals
Yan, Runze, Ding, Cheng, Xiao, Ran, Fedorov, Aleksandr, Lee, Randall J, Nahab, Fadi, Hu, Xiao
Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambulatory settings. Conventional approaches typically discard corrupted segments or attempt to reconstruct original signals, allowing for the use of standard machine learning techniques. However, this reduces dataset size and introduces biases, compromising prediction accuracy and the effectiveness of continuous monitoring. We propose a novel deep learning model, Signal Quality Weighted Fusion of Attentional Convolution and Recurrent Neural Network (SQUWA), designed to learn how to retain accurate predictions from partially corrupted PPG. Specifically, SQUWA innovatively integrates an attention mechanism that directly considers signal quality during the learning process, dynamically adjusting the weights of time series segments based on their quality. This approach enhances the influence of higher-quality segments while reducing that of lower-quality ones, effectively utilizing partially corrupted segments. This approach represents a departure from the conventional methods that exclude such segments, enabling the utilization of a broader range of data, which has great implications for less disruption when monitoring of AF risks and more accurate estimation of AF burdens. Our extensive experiments show that SQUWA outperform existing PPG-based models, achieving the highest AUCPR of 0.89 with label noise mitigation. This also exceeds the 0.86 AUCPR of models trained with using both electrocardiogram (ECG) and PPG data.
Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes
Liu, Darren, Ding, Cheng, Bold, Delgersuren, Bouvier, Monique, Lu, Jiaying, Shickel, Benjamin, Jabaley, Craig S., Zhang, Wenhui, Park, Soojin, Young, Michael J., Wainwright, Mark S., Clermont, Gilles, Rashidi, Parisa, Rosenthal, Eric S., Dimisko, Laurie, Xiao, Ran, Yoon, Joo Heung, Yang, Carl, Hu, Xiao
The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in real-world healthcare settings. Objective: We sought to evaluate the performance of LLMs in the complex clinical context of adult critical care medicine using systematic and comprehensible analytic methods, including clinician annotation and adjudication. Methods: We investigated the performance of three general LLMs in understanding and processing real-world clinical notes. Concepts from 150 clinical notes were identified by MetaMap and then labeled by 9 clinicians. Each LLM's proficiency was evaluated by identifying the temporality and negation of these concepts using different prompts for an in-depth analysis. Results: GPT-4 showed overall superior performance compared to other LLMs. In contrast, both GPT-3.5 and text-davinci-003 exhibit enhanced performance when the appropriate prompting strategies are employed. The GPT family models have demonstrated considerable efficiency, evidenced by their cost-effectiveness and time-saving capabilities. Conclusion: A comprehensive qualitative performance evaluation framework for LLMs is developed and operationalized. This framework goes beyond singular performance aspects. With expert annotations, this methodology not only validates LLMs' capabilities in processing complex medical data but also establishes a benchmark for future LLM evaluations across specialized domains.
Reconsideration on evaluation of machine learning models in continuous monitoring using wearables
Ding, Cheng, Guo, Zhicheng, Rudin, Cynthia, Xiao, Ran, Nahab, Fadi B, Hu, Xiao
Especially with the utilization of photoplethysmography (PPG) signal, these devices have demonstrated significant potential in providing real-time insights into an individual's health status. PPG, due to its non-invasive nature and ease of integration into wearable technology, has become a cornerstone in modern health monitoring systems [5]. Analyzing wearable device signals often involves ML models of different complexities [6, 7]. In the model development phase, typically, continuous signals are cut into discrete segments, and the model's performance is evaluated at the segment level using conventional metrics such as accuracy, sensitivity, specificity, and F1 score [8]. However, relying solely on these conventional metrics at the segment level does not provide a holistic assessment and hurts both consumers by making it impossible to select optimal solution for their needs and innovators by failing to guide their effort towards true progresses. The complex nature of continuous health monitoring using wearable devices introduces unique challenges beyond conventional evaluation approaches' capabilities, as illustrated in Figure 1. Recognizing these challenges is imperative for imbuing continuous health monitoring applications with accurate and reliable ML models to ensure a successful translation of these models into everyday use by millions of people and fulfill the potential of this technology at scale. In the subsequent sections, we outline the challenges in evaluating ML models for continuous health monitoring using wearables, thoroughly review existing evaluation methods and metrics, and propose a standardized evaluation guideline.