medical condition
From Detection to Mitigation: Addressing Bias in Deep Learning Models for Chest X-Ray Diagnosis
Mottez, Clemence, Fay, Louisa, Varma, Maya, Ostmeier, Sophie, Langlotz, Curtis
Deep learning models have shown promise in improving diagnostic accuracy from chest X-rays, but they also risk perpetuating healthcare disparities when performance varies across demographic groups. In this work, we present a comprehensive bias detection and mitigation framework targeting sex, age, and race-based disparities when performing diagnostic tasks with chest X-rays. We extend a recent CNN-XGBoost pipeline to support multi-label classification and evaluate its performance across four medical conditions. We show that replacing the final layer of CNN with an eXtreme Gradient Boosting classifier improves the fairness of the subgroup while maintaining or improving the overall predictive performance. To validate its generalizability, we apply the method to different backbones, namely DenseNet-121 and ResNet-50, and achieve similarly strong performance and fairness outcomes, confirming its model-agnostic design. We further compare this lightweight adapter training method with traditional full-model training bias mitigation techniques, including adversarial training, reweighting, data augmentation, and active learning, and find that our approach offers competitive or superior bias reduction at a fraction of the computational cost. Finally, we show that combining eXtreme Gradient Boosting retraining with active learning yields the largest reduction in bias across all demographic subgroups, both in and out of distribution on the CheXpert and MIMIC datasets, establishing a practical and effective path toward equitable deep learning deployment in clinical radiology.
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- Asia > Singapore (0.04)
- Asia > Middle East > Israel (0.04)
Health Insurance Coverage Rule Interpretation Corpus: Law, Policy, and Medical Guidance for Health Insurance Coverage Understanding
U.S. health insurance is complex, and inadequate understanding and limited access to justice have dire implications for the most vulnerable. Advances in natural language processing present an opportunity to support efficient, case-specific understanding, and to improve access to justice and healthcare. Yet existing corpora lack context necessary for assessing even simple cases. We collect and release a corpus of reputable legal and medical text related to U.S. health insurance. We also introduce an outcome prediction task for health insurance appeals designed to support regulatory and patient self-help applications, and release a labeled benchmark for our task, and models trained on it.
- North America > United States > California (0.05)
- North America > United States > Ohio (0.04)
- North America > United States > New York (0.04)
- (4 more...)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- (5 more...)
AI for Senior Citizens
We are now living longer, and the number of people worldwide aged 65 and over is expected to grow from 703 million in 2019 to 2.2 billion in 2080, according to the World Population Prospects Report published by the United Nations last year. The proportion of the global population that is elderly is also on the rise, almost doubling from 5.5% in 1974 to 10.3% last year, and it is projected to grow to 20.7% by 2074. A consequence of aging is that we are more likely to have medical problems. At the same time, the healthcare system in many countries is already stretched due to a lack of workers. "There are just not enough doctors and nurses to deal with a growing elderly population," said Massimiliano Zecca, a professor of healthcare technology at Loughborough University in the U.K. In the U.S, for example, a severe shortage of doctors is expected by 2034, with between 37,800 and 124,000 physicians lacking, partly fueled by the growing number of seniors, according to a recent report by the Association of American Medical Colleges (AAMC).
- Europe > United Kingdom > England > Leicestershire > Loughborough (0.25)
- North America > Canada > Ontario > Toronto (0.16)
- North America > United States > California > San Francisco County > San Francisco (0.05)
- Asia > Japan (0.05)
Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs
Karim, Ahmed Akib Jawad, Mahmud, Muhammad Zawad, Islam, Samiha, Azam, Aznur
In this research, we explored the improvement in terms of multi-class disease classification via pre-trained language models over Medical-Abstracts-TC-Corpus that spans five medical conditions. We excluded non-cancer conditions and examined four specific diseases. We assessed four LLMs, BioBERT, XLNet, and BERT, as well as a novel base model (Last-BERT). BioBERT, which was pre-trained on medical data, demonstrated superior performance in medical text classification (97% accuracy). Surprisingly, XLNet followed closely (96% accuracy), demonstrating its generalizability across domains even though it was not pre-trained on medical data. LastBERT, a custom model based on the lighter version of BERT, also proved competitive with 87.10% accuracy (just under BERT's 89.33%). Our findings confirm the importance of specialized models such as BioBERT and also support impressions around more general solutions like XLNet and well-tuned transformer architectures with fewer parameters (in this case, LastBERT) in medical domain tasks.
- Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.05)
- North America > United States > California > Orange County > Irvine (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
MediTOD: An English Dialogue Dataset for Medical History Taking with Comprehensive Annotations
Saley, Vishal Vivek, Saha, Goonjan, Das, Rocktim Jyoti, Raghu, Dinesh, Mausam, null
Medical task-oriented dialogue systems can assist doctors by collecting patient medical history, aiding in diagnosis, or guiding treatment selection, thereby reducing doctor burnout and expanding access to medical services. However, doctor-patient dialogue datasets are not readily available, primarily due to privacy regulations. Moreover, existing datasets lack comprehensive annotations involving medical slots and their different attributes, such as symptoms and their onset, progression, and severity. These comprehensive annotations are crucial for accurate diagnosis. Finally, most existing datasets are non-English, limiting their utility for the larger research community. In response, we introduce MediTOD, a new dataset of doctor-patient dialogues in English for the medical history-taking task. Collaborating with doctors, we devise a questionnaire-based labeling scheme tailored to the medical domain. Then, medical professionals create the dataset with high-quality comprehensive annotations, capturing medical slots and their attributes. We establish benchmarks in supervised and few-shot settings on MediTOD for natural language understanding, policy learning, and natural language generation subtasks, evaluating models from both TOD and biomedical domains. We make MediTOD publicly available for future research.
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (3 more...)
GPT-4V Cannot Generate Radiology Reports Yet
Jiang, Yuyang, Chen, Chacha, Nguyen, Dang, Mervak, Benjamin M., Tan, Chenhao
GPT-4V's purported strong multimodal abilities raise interests in using it to automate radiology report writing, but there lacks thorough evaluations. In this work, we perform a systematic evaluation of GPT-4V in generating radiology reports on two chest X-ray report datasets: MIMIC-CXR and IU X-Ray. We attempt to directly generate reports using GPT-4V through different prompting strategies and find that it fails terribly in both lexical metrics and clinical efficacy metrics. To understand the low performance, we decompose the task into two steps: 1) the medical image reasoning step of predicting medical condition labels from images; and 2) the report synthesis step of generating reports from (groundtruth) conditions. We show that GPT-4V's performance in image reasoning is consistently low across different prompts. In fact, the distributions of model-predicted labels remain constant regardless of which groundtruth conditions are present on the image, suggesting that the model is not interpreting chest X-rays meaningfully. Even when given groundtruth conditions in report synthesis, its generated reports are less correct and less natural-sounding than a finetuned LLaMA-2. Altogether, our findings cast doubt on the viability of using GPT-4V in a radiology workflow.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Michigan (0.04)
- North America > United States > Indiana (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Can Public LLMs be used for Self-Diagnosis of Medical Conditions ?
Balasubramanian, Nikil Sharan Prabahar, Dakshit, Sagnik
Advancements in deep learning have generated a large-scale interest in the development of foundational deep learning models. The development of Large Language Models (LLM) has evolved as a transformative paradigm in conversational tasks, which has led to its integration and extension even in the critical domain of healthcare. With LLMs becoming widely popular and their public access through open-source models and integration with other applications, there is a need to investigate their potential and limitations. One such crucial task where LLMs are applied but require a deeper understanding is that of self-diagnosis of medical conditions based on bias-validating symptoms in the interest of public health. The widespread integration of Gemini with Google search and GPT-4.0 with Bing search has led to a shift in the trend of self-diagnosis using search engines to conversational LLM models. Owing to the critical nature of the task, it is prudent to investigate and understand the potential and limitations of public LLMs in the task of self-diagnosis. In this study, we prepare a prompt engineered dataset of 10000 samples and test the performance on the general task of self-diagnosis. We compared the performance of both the state-of-the-art GPT-4.0 and the fee Gemini model on the task of self-diagnosis and recorded contrasting accuracies of 63.07% and 6.01%, respectively. We also discuss the challenges, limitations, and potential of both Gemini and GPT-4.0 for the task of self-diagnosis to facilitate future research and towards the broader impact of general public knowledge. Furthermore, we demonstrate the potential and improvement in performance for the task of self-diagnosis using Retrieval Augmented Generation.
- North America > United States > Texas > Smith County > Tyler (0.04)
- Asia > Middle East > Yemen > Amanat Al Asimah > Sanaa (0.04)
- Asia > Japan (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
Bayesian Networks and Machine Learning for COVID-19 Severity Explanation and Demographic Symptom Classification
Ajayi, Oluwaseun T., Cheng, Yu
With the prevailing efforts to combat the coronavirus disease 2019 (COVID-19) pandemic, there are still uncertainties that are yet to be discovered about its spread, future impact, and resurgence. In this paper, we present a three-stage data-driven approach to distill the hidden information about COVID-19. The first stage employs a Bayesian network structure learning method to identify the causal relationships among COVID-19 symptoms and their intrinsic demographic variables. As a second stage, the output from the Bayesian network structure learning, serves as a useful guide to train an unsupervised machine learning (ML) algorithm that uncovers the similarities in patients' symptoms through clustering. The final stage then leverages the labels obtained from clustering to train a demographic symptom identification (DSID) model which predicts a patient's symptom class and the corresponding demographic probability distribution. We applied our method on the COVID-19 dataset obtained from the Centers for Disease Control and Prevention (CDC) in the United States. Results from the experiments show a testing accuracy of 99.99%, as against the 41.15% accuracy of a heuristic ML method. This strongly reveals the viability of our Bayesian network and ML approach in understanding the relationship between the virus symptoms, and providing insights on patients' stratification towards reducing the severity of the virus.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- North America > Canada (0.04)
- (4 more...)
The Role of Emotions in Informational Support Question-Response Pairs in Online Health Communities: A Multimodal Deep Learning Approach
Jozani, Mohsen, Williams, Jason A., Aleroud, Ahmed, Bhagat, Sarbottam
This study explores the relationship between informational support seeking questions, responses, and helpfulness ratings in online health communities. We created a labeled data set of question-response pairs and developed multimodal machine learning and deep learning models to reliably predict informational support questions and responses. We employed explainable AI to reveal the emotions embedded in informational support exchanges, demonstrating the importance of emotion in providing informational support. This complex interplay between emotional and informational support has not been previously researched. The study refines social support theory and lays the groundwork for the development of user decision aids. Further implications are discussed.
- North America > United States > Wisconsin > Eau Claire County > Eau Claire (0.14)
- North America > United States > Hawaii (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > China (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning
Chen, Zhongzhi, Sun, Xingwu, Jiao, Xianfeng, Lian, Fengzong, Kang, Zhanhui, Wang, Di, Xu, Cheng-Zhong
Despite the great success of large language models (LLMs) in various tasks, they suffer from generating hallucinations. We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. Specifically, it creates multiple orthogonal bases for modeling truth by incorporating orthogonal constraints into the probes. Moreover, we introduce Random Peek, a systematic technique considering an extended range of positions within the sequence, reducing the gap between discerning and generating truth features in LLMs. By employing this approach, we improved the truthfulness of Llama-2-7B from 40.8\% to 74.5\% on TruthfulQA. Likewise, significant improvements are observed in fine-tuned models. We conducted a thorough analysis of truth features using probes. Our visualization results show that orthogonal probes capture complementary truth-related features, forming well-defined clusters that reveal the inherent structure of the dataset.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.27)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Africa > Middle East > Egypt (0.14)
- (85 more...)
- Research Report > New Finding (1.00)
- Personal > Honors (1.00)
- Transportation > Air (1.00)
- Media > Film (1.00)
- Leisure & Entertainment > Sports (1.00)
- (29 more...)