Goto

Collaborating Authors

 patient cluster


Learning a Distance for the Clustering of Patients with Amyotrophic Lateral Sclerosis

arXiv.org Artificial Intelligence

Amyotrophic lateral sclerosis (ALS) is a severe disease with a typical survival of 3-5 years after symptom onset. Current treatments offer only limited life extension, and the variability in patient responses highlights the need for personalized care. However, research is hindered by small, heterogeneous cohorts, sparse longitudinal data, and the lack of a clear definition for clinically meaningful patient clusters. Existing clustering methods remain limited in both scope and number. To address this, we propose a clustering approach that groups sequences using a disease progression declarative score. Our approach integrates medical expertise through multiple descriptive variables, investigating several distance measures combining such variables, both by reusing off-the-shelf distances and employing a weak-supervised learning method. We pair these distances with clustering methods and benchmark them against state-of-the-art techniques. The evaluation of our approach on a dataset of 353 ALS patients from the University Hospital of Tours, shows that our method outperforms state-of-the-art methods in survival analysis while achieving comparable silhouette scores. In addition, the learned distances enhance the relevance and interpretability of results for medical experts.



Interpreting deep embeddings for disease progression clustering

arXiv.org Artificial Intelligence

We propose a novel approach for interpreting deep embeddings in the context of patient clustering. We evaluate our approach on a dataset of participants with type 2 diabetes from the UK Biobank, and demonstrate clinically meaningful insights into disease progression patterns.


Enabling scalable clinical interpretation of ML-based phenotypes using real world data

arXiv.org Artificial Intelligence

The availability of large and deep electronic healthcare records (EHR) datasets has the potential to enable a better understanding of real-world patient journeys, and to identify novel subgroups of patients. ML-based aggregation of EHR data is mostly tool-driven, i.e., building on available or newly developed methods. However, these methods, their input requirements, and, importantly, resulting output are frequently difficult to interpret, especially without in-depth data science or statistical training. This endangers the final step of analysis where an actionable and clinically meaningful interpretation is needed.This study investigates approaches to perform patient stratification analysis at scale using large EHR datasets and multiple clustering methods for clinical research. We have developed several tools to facilitate the clinical evaluation and interpretation of unsupervised patient stratification results, namely pattern screening, meta clustering, surrogate modeling, and curation. These tools can be used at different stages within the analysis. As compared to a standard analysis approach, we demonstrate the ability to condense results and optimize analysis time. In the case of meta clustering, we demonstrate that the number of patient clusters can be reduced from 72 to 3 in one example. In another stratification result, by using surrogate models, we could quickly identify that heart failure patients were stratified if blood sodium measurements were available. As this is a routine measurement performed for all patients with heart failure, this indicated a data bias. By using further cohort and feature curation, these patients and other irrelevant features could be removed to increase the clinical meaningfulness. These examples show the effectiveness of the proposed methods and we hope to encourage further research in this field.


Outcome-Driven Clustering of Acute Coronary Syndrome Patients using Multi-Task Neural Network with Attention

arXiv.org Machine Learning

Cluster analysis aims at separating patients into phenotypically heterogenous groups and defining therapeutically homogeneous patient subclasses. It is an important approach in data-driven disease classification and subtyping. Acute coronary syndrome (ACS) is a syndrome due to sudden decrease of coronary artery blood flow, where disease classification would help to inform therapeutic strategies and provide prognostic insights. Here we conducted outcome-driven cluster analysis of ACS patients, which jointly considers treatment and patient outcome as indicators for patient state. Multi-task neural network with attention was used as a modeling framework, including learning of the patient state, cluster analysis, and feature importance profiling. Seven patient clusters were discovered. The clusters have different characteristics, as well as different risk profiles to the outcome of in-hospital major adverse cardiac events. The results demonstrate cluster analysis using outcome-driven multi-task neural network as promising for patient classification and subtyping.


An interpretable multiple kernel learning approach for the discovery of integrative cancer subtypes

arXiv.org Machine Learning

Due to the complexity of cancer, clustering algorithms have been used to disentangle the observed heterogeneity and identify cancer subtypes that can be treated specifically. While kernel based clustering approaches allow the use of more than one input matrix, which is an important factor when considering a multidimensional disease like cancer, the clustering results remain hard to evaluate and, in many cases, it is unclear which piece of information had which impact on the final result. In this paper, we propose an extension of multiple kernel learning clustering that enables the characterization of each identified patient cluster based on the features that had the highest impact on the result. To this end, we combine feature clustering with multiple kernel dimensionality reduction and introduce FIPPA, a score which measures the feature cluster impact on a patient cluster. Results: We applied the approach to different cancer types described by four different data types with the aim of identifying integrative patient subtypes and understanding which features were most important for their identification. Our results show that our method does not only have state-of-the-art performance according to standard measures (e.g., survival analysis), but, based on the high impact features, it also produces meaningful explanations for the molecular bases of the subtypes. This could provide an important step in the validation of potential cancer subtypes and enable the formulation of new hypotheses concerning individual patient groups. Similar analysis are possible for other disease phenotypes.


Clustering-Aided Approach for Predicting Patient Outcomes with Application to Elderly Healthcare in Ireland

AAAI Conferences

Predictive analytics have proved promising capabilities and opportunities to many aspects of healthcare practice. Data-driven insights can provide an important part of the solution for curbing rising costs and improving care quality. The paper implements machine learning techniques in an attempt to support decision making in relation to elderly healthcare in Ireland, with a particular focus on hip fracture care. We adopt a combination of unsupervised and supervised learning for predicting patient outcomes. Initially, elderly patients are grouped based on the similarity of age, length of stay (LOS) and elapsed time to surgery. Using the K-Means algorithm, our clustering experiments suggest the presence of three coherent clusters of patients. Subsequently, the discovered clusters are utilised to train prediction models that address a particular cluster of patients individually. In particular, two machine learning models are trained for every cluster of patients in order to predict the inpatient LOS, and discharge destination. The developed models are claimed to make predictions with relatively high accuracy. Furthermore, the potential usefulness of the clustering-guided approach of prediction is discussed in general.