Goto

Collaborating Authors

 meld score


Red Teaming Large Language Models for Healthcare

arXiv.org Artificial Intelligence

We present the design process and findings of the pre-conference workshop at the Machine Learning for Healthcare Conference (2024) entitled Red Teaming Large Language Models for Healthcare, which took place on August 15, 2024. Conference participants, comprising a mix of computational and clinical expertise, attempted to discover vulnerabilities -- realistic clinical prompts for which a large language model (LLM) outputs a response that could cause clinical harm. Red-teaming with clinicians enables the identification of LLM vulnerabilities that may not be recognised by LLM developers lacking clinical expertise. We report the vulnerabilities found, categorise them, and present the results of a replication study assessing the vulnerabilities across all LLMs provided.


Arterial blood pressure waveform in liver transplant surgery possesses variability of morphology reflecting recipients' acuity and predicting short term outcomes

arXiv.org Artificial Intelligence

Background: We investigated clinical information underneath the beat-to-beat fluctuation of the arterial blood pressure (ABP) waveform morphology. We proposed the Dynamical Diffusion Map algorithm (DDMap) to quantify the variability of morphology. The underlying physiology could be the compensatory mechanisms involving complex interactions between various physiological mechanisms to regulate the cardiovascular system. As a liver transplant surgery contains distinct periods, we investigated its clinical behavior in different surgical steps. Methods: Our study used DDmap algorithm, based on unsupervised manifold learning, to obtain a quantitative index for the beat-to-beat variability of morphology. We examined the correlation between the variability of ABP morphology and disease acuity as indicated by Model for End-Stage Liver Disease (MELD) scores, the postoperative laboratory data, and 4 early allograft failure (EAF) scores. Results: Among the 85 enrolled patients, the variability of morphology obtained during the presurgical phase was best correlated with MELD-Na scores. The neohepatic phase variability of morphology was associated with EAF scores as well as postoperative bilirubin levels, international normalized ratio, aspartate aminotransferase levels, and platelet count. Furthermore, variability of morphology presents more associations with the above clinical conditions than the common BP measures and their BP variability indices. Conclusions: The variability of morphology obtained during the presurgical phase is indicative of patient acuity, whereas those during the neohepatic phase are indicative of short-term surgical outcomes.


Fairly Predicting Graft Failure in Liver Transplant for Organ Assigning

arXiv.org Artificial Intelligence

Liver transplant is an essential therapy performed for severe liver diseases. The fact of scarce liver resources makes the organ assigning crucial. Model for End-stage Liver Disease (MELD) score is a widely adopted criterion when making organ distribution decisions. However, it ignores post-transplant outcomes and organ/donor features. These limitations motivate the emergence of machine learning (ML) models. Unfortunately, ML models could be unfair and trigger bias against certain groups of people. To tackle this problem, this work proposes a fair machine learning framework targeting graft failure prediction in liver transplant. Specifically, knowledge distillation is employed to handle dense and sparse features by combining the advantages of tree models and neural networks. A two-step debiasing method is tailored for this framework to enhance fairness. Experiments are conducted to analyze unfairness issues in existing models and demonstrate the superiority of our method in both prediction and fairness performance.


Inverse Contextual Bandits: Learning How Behavior Evolves over Time

arXiv.org Machine Learning

Understanding an agent's priorities by observing their behavior is critical for transparency and accountability in decision processes, such as in healthcare. While conventional approaches to policy learning almost invariably assume stationarity in behavior, this is hardly true in practice: Medical practice is constantly evolving, and clinical professionals are constantly fine-tuning their priorities. We desire an approach to policy learning that provides (1) interpretable representations of decision-making, accounts for (2) non-stationarity in behavior, as well as operating in an (3) offline manner. First, we model the behavior of learning agents in terms of contextual bandits, and formalize the problem of inverse contextual bandits (ICB). Second, we propose two algorithms to tackle ICB, each making varying degrees of assumptions regarding the agent's learning strategy. Finally, through both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as validating its accuracy.


Multi-Task Survival Analysis of Liver Transplantation Using Deep Learning

AAAI Conferences

In this paper, we present the application of deep learning techniques to develop a modern model for the prediction of graft failure and survival analysis in liver transplant patients. We trained our model using the United Network for Organ Sharing (UNOS) dataset consisting of 59,115 patients from year 2002 to 2016 with around 150 features each. We also compare our model against an- other dataset – Scientific Registry of Transplant Recipients (SRTR) including 87,334 patients from year 2002 to 2018 – after selecting features by mapping them from UNOS data. Some of the most important features common to both datasets are Model for End-stage Liver Disease (MELD) score, patient body mass index (BMI), donor and patient age, cold ischemia time, and levels of various chemicals within the patient. To provide an additional tool to clinical practitioners in the allocation of a scarce resource, we developed a multi-task model to learn the survival function of a donor-recipient pair and hence predict the exact time of failure which outper- forms the traditional cox hazard models. The multi-task model produces very promising C-index results of 0.82 and 0.57 on the SRTR and UNOS datasets respectively.