healthcare domain
Intelligent Understanding of Large Language Models in Traditional Chinese Medicine Based on Prompt Engineering Framework
Chen, Yirui, Xiao, Qinyu, Yi, Jia, Chen, Jing, Wang, Mengyang
This paper explores the application of prompt engineering to enhance the performance of large language models (LLMs) in the domain of Traditional Chinese Medicine (TCM). We propose TCM-Prompt, a framework that integrates various pre-trained language models (PLMs), templates, tokenization, and verbalization methods, allowing researchers to easily construct and fine-tune models for specific TCM-related tasks. We conducted experiments on disease classification, syndrome identification, herbal medicine recommendation, and general NLP tasks, demonstrating the effectiveness and superiority of our approach compared to baseline methods. Our findings suggest that prompt engineering is a promising technique for improving the performance of LLMs in specialized domains like TCM, with potential applications in digitalization, modernization, and personalized medicine.
On The Persona-based Summarization of Domain-Specific Documents
Mullick, Ankan, Bose, Sombit, Saha, Rounak, Bhowmick, Ayan Kumar, Goyal, Pawan, Ganguly, Niloy, Dey, Prasenjit, Kokku, Ravi
In an ever-expanding world of domain-specific knowledge, the increasing complexity of consuming, and storing information necessitates the generation of summaries from large information repositories. However, every persona of a domain has different requirements of information and hence their summarization. For example, in the healthcare domain, a persona-based (such as Doctor, Nurse, Patient etc.) approach is imperative to deliver targeted medical information efficiently. Persona-based summarization of domain-specific information by humans is a high cognitive load task and is generally not preferred. The summaries generated by two different humans have high variability and do not scale in cost and subject matter expertise as domains and personas grow. Further, AI-generated summaries using generic Large Language Models (LLMs) may not necessarily offer satisfactory accuracy for different domains unless they have been specifically trained on domain-specific data and can also be very expensive to use in day-to-day operations. Our contribution in this paper is two-fold: 1) We present an approach to efficiently fine-tune a domain-specific small foundation LLM using a healthcare corpus and also show that we can effectively evaluate the summarization quality using AI-based critiquing. 2) We further show that AI-based critiquing has good concordance with Human-based critiquing of the summaries. Hence, such AI-based pipelines to generate domain-specific persona-based summaries can be easily scaled to other domains such as legal, enterprise documents, education etc. in a very efficient and cost-effective manner.
Effects of Different Prompts on the Quality of GPT-4 Responses to Dementia Care Questions
Li, Zhuochun, Xie, Bo, Hilsabeck, Robin, Aguirre, Alyssa, Zou, Ning, Luo, Zhimeng, He, Daqing
Evidence suggests that different prompts lead large language models (LLMs) to generate responses with varying quality. Yet, little is known about prompts' effects on response quality in healthcare domains. In this exploratory study, we address this gap, focusing on a specific healthcare domain: dementia caregiving. We first developed an innovative prompt template with three components: (1) system prompts (SPs) featuring 4 different roles; (2) an initialization prompt; and (3) task prompts (TPs) specifying different levels of details, totaling 12 prompt combinations. Next, we selected 3 social media posts containing complicated, real-world questions about dementia caregivers' challenges in 3 areas: memory loss and confusion, aggression, and driving. We then entered these posts into GPT-4, with our 12 prompts, to generate 12 responses per post, totaling 36 responses. We compared the word count of the 36 responses to explore potential differences in response length. Two experienced dementia care clinicians on our team assessed the response quality using a rating scale with 5 quality indicators: factual, interpretation, application, synthesis, and comprehensiveness (scoring range: 0-5; higher scores indicate higher quality).
A Knowledge Graph-Based Search Engine for Robustly Finding Doctors and Locations in the Healthcare Domain
Kejriwal, Mayank, Haidarian, Hamid, Chiu, Min-Hsueh, Xiang, Andy, Shrestha, Deep, Javed, Faizan
Efficiently finding doctors and locations is an important search problem for patients in the healthcare domain, for which traditional information retrieval methods tend not to work optimally. In the last ten years, knowledge graphs (KGs) have emerged as a powerful way to combine the benefits of gleaning insights from semi-structured data using semantic modeling, natural language processing techniques like information extraction, and robust querying using structured query languages like SPARQL and Cypher. In this short paper, we present a KG-based search engine architecture for robustly finding doctors and locations in the healthcare domain. Early results demonstrate that our approach can lead to significantly higher coverage for complex queries without degrading quality.
Towards A Unified Utilitarian Ethics Framework for Healthcare Artificial Intelligence
Emdad, Forhan Bin, Ho, Shuyuan Mary, Ravuri, Benhur, Hussain, Shezin
Artificial Intelligence (AI) aims to elevate healthcare to a pinnacle by aiding clinical decision support. Overcoming the challenges related to the design of ethical AI will enable clinicians, physicians, healthcare professionals, and other stakeholders to use and trust AI in healthcare settings. This study attempts to identify the major ethical principles influencing the utility performance of AI at different technological levels such as data access, algorithms, and systems through a thematic analysis. We observed that justice, privacy, bias, lack of regulations, risks, and interpretability are the most important principles to consider for ethical AI. This data-driven study has analyzed secondary survey data from the Pew Research Center (2020) of 36 AI experts to categorize the top ethical principles of AI design. To resolve the ethical issues identified by the meta-analysis and domain experts, we propose a new utilitarian ethics-based theoretical framework for designing ethical AI for the healthcare domain.
Sample Size in Natural Language Processing within Healthcare Research
Chaturvedi, Jaya, Shamsutdinova, Diana, Zimmer, Felix, Velupillai, Sumithra, Stahl, Daniel, Stewart, Robert, Roberts, Angus
Sample size calculation is an essential step in most data-based disciplines. Large enough samples ensure representativeness of the population and determine the precision of estimates. This is true for most quantitative studies, including those that employ machine learning methods, such as natural language processing, where free-text is used to generate predictions and classify instances of text. Within the healthcare domain, the lack of sufficient corpora of previously collected data can be a limiting factor when determining sample sizes for new studies. This paper tries to address the issue by making recommendations on sample sizes for text classification tasks in the healthcare domain. Models trained on the MIMIC-III database of critical care records from Beth Israel Deaconess Medical Center were used to classify documents as having or not having Unspecified Essential Hypertension, the most common diagnosis code in the database. Simulations were performed using various classifiers on different sample sizes and class proportions. This was repeated for a comparatively less common diagnosis code within the database of diabetes mellitus without mention of complication. Smaller sample sizes resulted in better results when using a K-nearest neighbours classifier, whereas larger sample sizes provided better results with support vector machines and BERT models. Overall, a sample size larger than 1000 was sufficient to provide decent performance metrics. The simulations conducted within this study provide guidelines that can be used as recommendations for selecting appropriate sample sizes and class proportions, and for predicting expected performance, when building classifiers for textual healthcare data. The methodology used here can be modified for sample size estimates calculations with other datasets.
Fairness and Privacy in Federated Learning and Their Implications in Healthcare
Annapareddy, Navya, Preston, Jade, Fox, Judy
Currently, many contexts exist where distributed learning is difficult or otherwise constrained by security and communication limitations. One common domain where this is a consideration is in Healthcare where data is often governed by data-use-ordinances like HIPAA. On the other hand, larger sample sizes and shared data models are necessary to allow models to better generalize on account of the potential for more variability and balancing underrepresented classes. Federated learning is a type of distributed learning model that allows data to be trained in a decentralized manner. This, in turn, addresses data security, privacy, and vulnerability considerations as data itself is not shared across a given learning network nodes. Three main challenges to federated learning include node data is not independent and identically distributed (iid), clients requiring high levels of communication overhead between peers, and there is the heterogeneity of different clients within a network with respect to dataset bias and size. As the field has grown, the notion of fairness in federated learning has also been introduced through novel implementations. Fairness approaches differ from the standard form of federated learning and also have distinct challenges and considerations for the healthcare domain. This paper endeavors to outline the typical lifecycle of fair federated learning in research as well as provide an updated taxonomy to account for the current state of fairness in implementations. Lastly, this paper provides added insight into the implications and challenges of implementing and supporting fairness in federated learning in the healthcare domain.
Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding
Wang, Yuqing, Zhao, Yun, Petzold, Linda
Large language models (LLMs) have made significant progress in various domains, including healthcare. However, the specialized nature of clinical language understanding tasks presents unique challenges and limitations that warrant further investigation. In this study, we conduct a comprehensive evaluation of state-of-the-art LLMs, namely GPT-3.5, GPT-4, and Bard, within the realm of clinical language understanding tasks. These tasks span a diverse range, including named entity recognition, relation extraction, natural language inference, semantic textual similarity, document classification, and question-answering. We also introduce a novel prompting strategy, self-questioning prompting (SQP), tailored to enhance LLMs' performance by eliciting informative questions and answers pertinent to the clinical scenarios at hand. Our evaluation underscores the significance of task-specific learning strategies and prompting techniques for improving LLMs' effectiveness in healthcare-related tasks. Additionally, our in-depth error analysis on the challenging relation extraction task offers valuable insights into error distribution and potential avenues for improvement using SQP. Our study sheds light on the practical implications of employing LLMs in the specialized domain of healthcare, serving as a foundation for future research and the development of potential applications in healthcare settings.
Challenges facing the explainability of age prediction models: case study for two modalities
Spytek, Mikolaj, Hryniewska-Guzik, Weronika, Zygierewicz, Jaroslaw, Rogala, Jacek, Biecek, Przemyslaw
The prediction of age is a challenging task with various practical applications in high-impact fields like the healthcare domain or criminology. Despite the growing number of models and their increasing performance, we still know little about how these models work. Numerous examples of failures of AI systems show that performance alone is insufficient, thus, new methods are needed to explore and explain the reasons for the model's predictions. In this paper, we investigate the use of Explainable Artificial Intelligence (XAI) for age prediction focusing on two specific modalities, EEG signal and lung X-rays. We share predictive models for age to facilitate further research on new techniques to explain models for these modalities.
A Comparison of Methods for Neural Network Aggregation
Deep learning has been successful in the theoretical aspect. For deep learning to succeed in industry, we need to have algorithms capable of handling many inconsistencies appearing in real data. These inconsistencies can have large effects on the implementation of a deep learning algorithm. Artificial Intelligence is currently changing the medical industry. However, receiving authorization to use medical data for training machine learning algorithms is a huge hurdle. A possible solution is sharing the data without sharing the patient information. We propose a multi-party computation protocol for the deep learning algorithm. The protocol enables to conserve both the privacy and the security of the training data. Three approaches of neural networks assembly are analyzed: transfer learning, average ensemble learning, and series network learning. The results are compared to approaches based on data-sharing in different experiments. We analyze the security issues of the proposed protocol. Although the analysis is based on medical data, the results of multi-party computation of machine learning training are theoretical and can be implemented in multiple research areas.