NeuroLingua: A Language-Inspired Hierarchical Framework for Multimodal Sleep Stage Classification Using EEG and EOG

Samaee, Mahdi, Yazdi, Mehran, Massicotte, Daniel

arXiv.org Artificial Intelligence 

We propose NeuroLingua, a language - inspired framework that conceptualizes sleep as a structured physiological language. Each 30 - second epoch is decomposed into overlapping 3 - second subwindows ("tokens") using a CNN - based tokenizer, enabling hierarchical temporal modeling through dual - level Transformers: intra - segment encoding of local dependencies and inter - segment integration across seven consecutive epochs (3.5 minutes) for extended context. Modality - specific embeddings from EEG and EOG channels are fused via a Graph Convolutional Network, facilitating robust multimodal integration. NeuroLingua is evaluated on the Sleep - EDF Expanded and ISRUC - Sleep datasets, achieving state - of - the - art results on Sleep - EDF (85.3% accuracy, 0.800 macro F1, and 0.796 Cohen's κ), and competitive performance on ISRUC (81.9% accuracy, 0.802 macro F1, and 0.755 κ), matching or exceeding published baselines in overall and per - class metrics. The architecture's attentio n mechanisms enhance the detection of clinically relevant sleep microevents, providing a principled foundation for future interpretability, explainability and causal inference in sleep research. By framing sleep as a compositional language, NeuroLingua uni fies hierarchical sequence modeling and multimodal fusion, advancing automated sleep staging toward more transparent and clinically meaningful applications. Index Terms -- Sleep staging, EEG, EOG, Polysomnography, Deep learning, Hierarchical sequence modeling, Multimodal fusion, Transformers, Graph neural networks, Interpretability, Explainability, Causal inference.