Goto

Collaborating Authors

 physiological data






StressID: a Multimodal Dataset for Stress Identification

Neural Information Processing Systems

Total size 5.29GB Physiological total duration across subjects and across tasks 1119 min Video total duration across subjects and across tasks 918 min Audio total duration across subjects and across tasks 385 minFigure 1: A dataset summary card for StressID, constructed based on [2, 5]. 3 Figure 2: Organisation of the


A Unified AI Approach for Continuous Monitoring of Human Health and Diseases from Intensive Care Unit to Home with Physiological Foundation Models (UNIPHY+)

arXiv.org Artificial Intelligence

We present UNIPHY+, a unified physiological foundation model (physioFM) framework designed to enable continuous human health and diseases monitoring across care settings using ubiquitously obtainable physiological data. We propose novel strategies for incorporating contextual information during pretraining, fine-tuning, and lightweight model personalization via multi-modal learning, feature fusion-tuning, and knowledge distillation. We advocate testing UNIPHY+ with a broad set of use cases from intensive care to ambulatory monitoring in order to demonstrate that UNIPHY+ can empower generalizable, scalable, and personalized physiological AI to support both clinical decision-making and long-term health monitoring.


Distinguishing Startle from Surprise Events Based on Physiological Signals

arXiv.org Artificial Intelligence

Unexpected events can impair attention and delay decision-making, posing serious safety risks in high-risk environments such as aviation. In particular, reactions like startle and surprise can impact pilot performance in different ways, yet are often hard to distinguish in practice. Existing research has largely studied these reactions separately, with limited focus on their combined effects or how to differentiate them using physiological data. In this work, we address this gap by distinguishing between startle and surprise events based on physiological signals using machine learning and multi-modal fusion strategies. Our results demonstrate that these events can be reliably predicted, achieving a highest mean accuracy of 85.7% with SVM and Late Fusion. To further validate the robustness of our model, we extended the evaluation to include a baseline condition, successfully differentiating between Startle, Surprise, and Baseline states with a highest mean accuracy of 74.9% with XGBoost and Late Fusion.


UniPhyNet: A Unified Network For Multimodal Physiological Raw Signal Classification

arXiv.org Machine Learning

We present UniPhyNet, a novel neural network architecture to classify cognitive load using multimodal physiological data -- specifically EEG, ECG and EDA signals -- without the explicit need for extracting hand-crafted features. UniPhyNet integrates multiscale parallel convolutional blocks and ResNet-type blocks enhanced with channel block attention module to focus on the informative features while a bidirectional gated recurrent unit is used to capture temporal dependencies. This architecture processes and combines signals in both unimodal and multimodal configurations via intermediate fusion of learned feature maps. On the CL-Drive dataset, UniPhyNet improves raw signal classification accuracy from 70% to 80% (binary) and 62% to 74% (ternary), outperforming feature-based models, demonstrating its effectiveness as an end-to-end solution for real-world cognitive state monitoring.


Transformer representation learning is necessary for dynamic multi-modal physiological data on small-cohort patients

arXiv.org Artificial Intelligence

Transformer representation learning is necessary for dynamic multi-modal physiological data on small-cohort patients Bingxu Wang, Min Ge, Kunzhi Cai, Yuqi Zhang, Zeyi Zhou, Wenjiao Li, Yachong Guo,, Wei Wang,, and Qing Zhou, Department of Thoracic and Cardiovascular Surgery, The Affiliated Drum Tower Hospital of Nanjing University Medical School, Kuang Yaming Honors School, Nanjing University, Nanjing 210023, China National Laboratory of Solid State Microstructure, Department of Physics, Nanjing University, Nanjing 210093, China E-mail: yguo@nju.edu.cn; Abstract Postoperative delirium (POD), a severe neuropsychiatric complication affecting nearly 50% of high-risk surgical patients, is defined as an acute disorder of attention and cognition, It remains significantly underdiagnosed in the intensive care units (ICUs) due to subjective monitoring methods. Early and accurate diagnosis of POD is critical and achievable. Here, we propose a POD prediction framework comprising a Transformer representation model followed by traditional machine learning algorithms. We curated the first multi-modal POD dataset encompass-1 ing two patient types and evaluated the various Transformer architectures for representation learning. Empirical results indicate a consistent improvements of sensitivity and Youden index in patient TYPE I using Transformer representations, particularly our fusion adaptation of Pathformer. By enabling effective delirium diagnosis from postoperative day 1 to 3, our extensive experimental findings emphasize the potential of multi-modal physiological data and highlight the necessity of representation learning via multi-modal Transformer architecture in clinical diagnosis. Introduction Postoperative delirium(POD), a prevalent acute neuropsychiatric syndrome 1,2, affects more than 50% of surgical patients and significantly elevates morbidity and mortality risks 3 . Early identification is crucial yet challenging 4, primarily due to subjective assessment criteria and incomplete understanding of underlying pathophysiological mechanisms 5 .


An Attentive Dual-Encoder Framework Leveraging Multimodal Visual and Semantic Information for Automatic OSAHS Diagnosis

arXiv.org Artificial Intelligence

Obstructive sleep apnea-hypopnea syndrome (OSAHS) [1] Our key contributions are as follows: (1) Introducing VTA-affects about 27% of adults [2], causing poor sleep, daytime OSAHS, a multimodal framework for diagnosing OSAHS dysfunction, and higher risks of cardiovascular diseases and diabetes severity by combining visual and language data, and using [3]. The standard diagnostic method, polysomnography a pre-trained language model to extract key information from (PSG) [4], is complex, costly, and uncomfortable, requiring basic physiological data for improved classification accuracy; multi-channel monitoring (EEG, ECG, heart rate [5]) and (2) Developing a visual encoder that focuses on specific facial trained technicians (Figure 1). Data-driven methods for automated features associated with OSAHS, employing attention mesh OSAHS diagnosis can improve efficiency and reduce and stochastic gates for better clinical decision alignment; (3) costs. Facial features like a flat nasal bridge, wide jawbone, Implementing a data pre-processing strategy to handle imbalanced thick neck, and mandibular retrognathia correlate with OSAHS samples and ordinal classification, using randomOver-severity [6], providing visual indicators of airway obstruction Sampler (ROS) [17] and an ordinal regression loss function and sleep disturbances. Deep learning can analyze these features [18] to enhance accuracy and robustness; (4) Demonstrating for early diagnosis and personalized treatment.