Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model
Jin, Jiarui, Wang, Haoyu, Li, Hongyan, Li, Jun, Pan, Jiahui, Hong, Shenda
–arXiv.org Artificial Intelligence
Electrocardiogram (ECG) is essential for the clinical diagnosis of arrhythmias and other heart diseases, but deep learning methods based on ECG often face limitations due to the need for high-quality annotations. Although previous ECG self-supervised learning (eSSL) methods have made significant progress in representation learning from unannotated ECG data, they typically treat ECG signals as ordinary time-series data, segmenting the signals using fixed-size and fixed-step time windows, which often ignore the form and rhythm characteristics and latent semantic relationships in ECG signals. In this work, we introduce a novel perspective on ECG signals, treating heartbeats as words and rhythms as sentences. Based on this perspective, we first designed the QRS-Tokenizer, which generates semantically meaningful ECG sentences from the raw ECG signals. Building on these, we then propose HeartLang, a novel self-supervised learning framework for ECG language processing, learning general representations at form and rhythm levels. Additionally, we construct the largest heartbeat-based ECG vocabulary to date, which will further advance the development of ECG language processing. We evaluated HeartLang across six public ECG datasets, where it demonstrated robust competitiveness against other eSSL methods. Our data and code are publicly available at https://github.com/PKUDigitalHealth/HeartLang. Electrocardiogram (ECG) is a common type of clinical data used to monitor cardiac activity, and is frequently employed in diagnosing cardiac diseases or conditions impairing myocardial function (Hong et al., 2020; Liu et al., 2021). A primary limitation of using supervised deep learning methods for ECG signal analysis is their dependency on largescale, expert-reviewed, annotated high-quality data.
arXiv.org Artificial Intelligence
Feb-15-2025