Goto

Collaborating Authors

 Gao, Fengyi


Precision Rehabilitation for Patients Post-Stroke based on Electronic Health Records and Machine Learning

arXiv.org Artificial Intelligence

In this study, we utilized statistical analysis and machine learning methods to examine whether rehabilitation exercises can improve patients post-stroke functional abilities, as well as forecast the improvement in functional abilities. Our dataset is patients' rehabilitation exercises and demographic information recorded in the unstructured electronic health records (EHRs) data and free-text rehabilitation procedure notes. We collected data for 265 stroke patients from the University of Pittsburgh Medical Center. We employed a pre-existing natural language processing (NLP) algorithm to extract data on rehabilitation exercises and developed a rule-based NLP algorithm to extract Activity Measure for Post-Acute Care (AM-PAC) scores, covering basic mobility (BM) and applied cognitive (AC) domains, from procedure notes. Changes in AM-PAC scores were classified based on the minimal clinically important difference (MCID), and significance was assessed using Friedman and Wilcoxon tests. To identify impactful exercises, we used Chi-square tests, Fisher's exact tests, and logistic regression for odds ratios. Additionally, we developed five machine learning models-logistic regression (LR), Adaboost (ADB), support vector machine (SVM), gradient boosting (GB), and random forest (RF)-to predict outcomes in functional ability. Statistical analyses revealed significant associations between functional improvements and specific exercises. The RF model achieved the best performance in predicting functional outcomes. In this study, we identified three rehabilitation exercises that significantly contributed to patient post-stroke functional ability improvement in the first two months. Additionally, the successful application of a machine learning model to predict patient-specific functional outcomes underscores the potential for precision rehabilitation.


Large Language Models Vote: Prompting for Rare Disease Identification

arXiv.org Artificial Intelligence

The emergence of generative Large Language Models (LLMs) emphasizes the need for accurate and efficient prompting approaches. LLMs are often applied in Few-Shot Learning (FSL) contexts, where tasks are executed with minimal training data. FSL has become popular in many Artificial Intelligence (AI) subdomains, including AI for health. Rare diseases affect a small fraction of the population. Rare disease identification from clinical notes inherently requires FSL techniques due to limited data availability. Manual data collection and annotation is both expensive and time-consuming. In this paper, we propose Models-Vote Prompting (MVP), a flexible prompting approach for improving the performance of LLM queries in FSL settings. MVP works by prompting numerous LLMs to perform the same tasks and then conducting a majority vote on the resulting outputs. This method achieves improved results to any one model in the ensemble on one-shot rare disease identification and classification tasks. We also release a novel rare disease dataset for FSL, available to those who signed the MIMIC-IV Data Use Agreement (DUA). Furthermore, in using MVP, each model is prompted multiple times, substantially increasing the time needed for manual annotation, and to address this, we assess the feasibility of using JSON for automating generative LLM evaluation.


Extracting Physical Rehabilitation Exercise Information from Clinical Notes: a Comparison of Rule-Based and Machine Learning Natural Language Processing Techniques

arXiv.org Artificial Intelligence

However, physical therapy procedures are typically described in unstructured clinical notes, meaning that simple data extraction methods such as database queries cannot be applied to obtain sufficient information. Additionally, the language used to describe these procedures can differ between clinicians, cites, and times. A more advanced natural language processing (NLP) algorithm is required to extract this information from clinical notes, but such a method has not yet been developed for this application. In this paper we devise and compare several approaches to extracting information about therapeutic procedures for physical rehabilitation, both for the purpose of emulating a manual annotation process using named entity recognition (NER) and categorizing descriptions of therapeutic procedures using multi label sequence classification. Using a set of manually annotated notes as a gold standard reference, we evaluated the performance of a rule-based algorithm using the MedTagger software, and several machine learning approaches such as logistic regression (LR) and support vector machines (SVM). Methods Data Collection We identified a cohort of patients diagnosed with stroke between January 1st, 2016 and December 31st, 2016 at UPMC. For these patients, we extracted clinical encounter notes created between January 1st, 2016 and December 31st, 2018 from the institutional data warehouse. The study was approved by the University of Pittsburgh's Institutional Review Board (IRB #21040204).