female patient
A Bayesian Model for Multi-stage Censoring
Sadhuka, Shuvom, Lin, Sophia, Berger, Bonnie, Pierson, Emma
Many sequential decision settings in healthcare feature funnel structures characterized by a series of stages, such as screenings or evaluations, where the number of patients who advance to each stage progressively decreases and decisions become increasingly costly. For example, an oncologist may first conduct a breast exam, followed by a mammogram for patients with concerning exams, followed by a biopsy for patients with concerning mammograms. A key challenge is that the ground truth outcome, such as the biopsy result, is only revealed at the end of this funnel. The selective censoring of the ground truth can introduce statistical biases in risk estimation, especially in underserved patient groups, whose outcomes are more frequently censored. We develop a Bayesian model for funnel decision structures, drawing from prior work on selective labels and censoring. We first show in synthetic settings that our model is able to recover the true parameters and predict outcomes for censored patients more accurately than baselines. We then apply our model to a dataset of emergency department visits, where in-hospital mortality is observed only for those who are admitted to either the hospital or ICU. We find that there are gender-based differences in hospital and ICU admissions. In particular, our model estimates that the mortality risk threshold to admit women to the ICU is higher for women (5.1%) than for men (4.5%).
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Robustness and sex differences in skin cancer detection: logistic regression vs CNNs
Pedersen, Nikolette, Sydendal, Regitze, Wulff, Andreas, Raumanns, Ralf, Petersen, Eike, Cheplygina, Veronika
Deep learning has been reported to achieve high performances in the detection of skin cancer, yet many challenges regarding the reproducibility of results and biases remain. This study is a replication (different data, same analysis) of a previous study on Alzheimer's disease detection, which studied the robustness of logistic regression (LR) and convolutional neural networks (CNN) across patient sexes. We explore sex bias in skin cancer detection, using the PAD-UFES-20 dataset with LR trained on handcrafted features reflecting dermatological guidelines (ABCDE and the 7-point checklist), and a pre-trained ResNet-50 model. We evaluate these models in alignment with the replicated study: across multiple training datasets with varied sex composition to determine their robustness. Our results show that both the LR and the CNN were robust to the sex distribution, but the results also revealed that the CNN had a significantly higher accuracy (ACC) and area under the receiver operating characteristics (AUROC) for male patients compared to female patients. The data and relevant scripts to reproduce our results are publicly available (https://github.com/
- Europe > Netherlands > North Brabant > Eindhoven (0.05)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Europe > Germany (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Dermatology (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (0.95)
A Data-Centric Approach to Detecting and Mitigating Demographic Bias in Pediatric Mental Health Text: A Case Study in Anxiety Detection
Ive, Julia, Bondaronek, Paulina, Yadav, Vishal, Santel, Daniel, Glauser, Tracy, Cheng, Tina, Strawn, Jeffrey R., Agasthya, Greeshma, Tschida, Jordan, Choo, Sanghyun, Chandrashekar, Mayanka, Kapadia, Anuj J., Pestian, John
Introduction: Healthcare AI models often inherit biases from their training data. While efforts have primarily targeted bias in structured data, mental health heavily depends on unstructured data. This study aims to detect and mitigate linguistic differences related to non-biological differences in the training data of AI models designed to assist in pediatric mental health screening. Our objectives are: (1) to assess the presence of bias by evaluating outcome parity across sex subgroups, (2) to identify bias sources through textual distribution analysis, and (3) to develop a de-biasing method for mental health text data. Methods: We examined classification parity across demographic groups and assessed how gendered language influences model predictions. A data-centric de-biasing method was applied, focusing on neutralizing biased terms while retaining salient clinical information. This methodology was tested on a model for automatic anxiety detection in pediatric patients. Results: Our findings revealed a systematic under-diagnosis of female adolescent patients, with a 4% lower accuracy and a 9% higher False Negative Rate (FNR) compared to male patients, likely due to disparities in information density and linguistic differences in patient notes. Notes for male patients were on average 500 words longer, and linguistic similarity metrics indicated distinct word distributions between genders. Implementing our de-biasing approach reduced diagnostic bias by up to 27%, demonstrating its effectiveness in enhancing equity across demographic groups. Discussion: We developed a data-centric de-biasing framework to address gender-based content disparities within clinical text. By neutralizing biased language and enhancing focus on clinically essential information, our approach demonstrates an effective strategy for mitigating bias in AI healthcare models trained on text.
- North America > United States > Ohio > Hamilton County > Cincinnati (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (5 more...)
A Classification Benchmark for Artificial Intelligence Detection of Laryngeal Cancer from Patient Speech
Paterson, Mary, Moor, James, Cutillo, Luisa
Cases of laryngeal cancer are predicted to rise significantly in the coming years. Current diagnostic pathways cause many patients to be incorrectly referred to urgent suspected cancer pathways, putting undue stress on both patients and the medical system. Artificial intelligence offers a promising solution by enabling non-invasive detection of laryngeal cancer from patient speech, which could help prioritise referrals more effectively and reduce inappropriate referrals of non-cancer patients. To realise this potential, open science is crucial. A major barrier in this field is the lack of open-source datasets and reproducible benchmarks, forcing researchers to start from scratch. Our work addresses this challenge by introducing a benchmark suite comprising 36 models trained and evaluated on open-source datasets. These models are accessible in a public repository, providing a foundation for future research. They evaluate three different algorithms and three audio feature sets, offering a comprehensive benchmarking framework. We propose standardised metrics and evaluation methodologies to ensure consistent and comparable results across future studies. The presented models include both audio-only inputs and multimodal inputs that incorporate demographic and symptom data, enabling their application to datasets with diverse patient information. By providing these benchmarks, future researchers can evaluate their datasets, refine the models, and use them as a foundation for more advanced approaches. This work aims to provide a baseline for establishing reproducible benchmarks, enabling researchers to compare new methods against these standards and ultimately advancing the development of AI tools for detecting laryngeal cancer.
- Europe > United Kingdom (0.14)
- Europe > Germany > Saarland > Saarbrücken (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Data Science > Data Mining (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Dataset Distribution Impacts Model Fairness: Single vs. Multi-Task Learning
Raumanns, Ralf, Schouten, Gerard, Pluim, Josien P. W., Cheplygina, Veronika
The influence of bias in datasets on the fairness of model predictions is a topic of ongoing research in various fields. We evaluate the performance of skin lesion classification using ResNet-based CNNs, focusing on patient sex variations in training data and three different learning strategies. We present a linear programming method for generating datasets with varying patient sex and class labels, taking into account the correlations between these variables. We evaluated the model performance using three different learning strategies: a single-task model, a reinforcing multi-task model, and an adversarial learning scheme. Our observations include: 1) sex-specific training data yields better results, 2) single-task models exhibit sex bias, 3) the reinforcement approach does not remove sex bias, 4) the adversarial model eliminates sex bias in cases involving only female patients, and 5) datasets that include male patients enhance model performance for the male subgroup, even when female patients are the majority. To generalise these findings, in future research, we will examine more demographic attributes, like age, and other possibly confounding factors, such as skin colour and artefacts in the skin lesions. We make all data and models available on GitHub.
- Europe > Netherlands > North Brabant > Eindhoven (0.05)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Health & Medicine > Therapeutic Area > Dermatology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Demographic Bias of Expert-Level Vision-Language Foundation Models in Medical Imaging
Yang, Yuzhe, Liu, Yujia, Liu, Xin, Gulhane, Avanti, Mastrodicasa, Domenico, Wu, Wei, Wang, Edward J, Sahani, Dushyant W, Patel, Shwetak
Advances in artificial intelligence (AI) have achieved expert-level performance in medical imaging applications. Notably, self-supervised vision-language foundation models can detect a broad spectrum of pathologies without relying on explicit training annotations. However, it is crucial to ensure that these AI models do not mirror or amplify human biases, thereby disadvantaging historically marginalized groups such as females or Black patients. The manifestation of such biases could systematically delay essential medical care for certain patient subgroups. In this study, we investigate the algorithmic fairness of state-of-the-art vision-language foundation models in chest X-ray diagnosis across five globally-sourced datasets. Our findings reveal that compared to board-certified radiologists, these foundation models consistently underdiagnose marginalized groups, with even higher rates seen in intersectional subgroups, such as Black female patients. Such demographic biases present over a wide range of pathologies and demographic attributes. Further analysis of the model embedding uncovers its significant encoding of demographic information. Deploying AI systems with these biases in medical imaging can intensify pre-existing care disparities, posing potential challenges to equitable healthcare access and raising ethical questions about their clinical application.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (7 more...)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Hidden in Plain Sight: Undetectable Adversarial Bias Attacks on Vulnerable Patient Populations
Kulkarni, Pranav, Chan, Andrew, Navarathna, Nithya, Chan, Skylar, Yi, Paul H., Parekh, Vishwa S.
The proliferation of artificial intelligence (AI) in radiology has shed light on the risk of deep learning (DL) models exacerbating clinical biases towards vulnerable patient populations. While prior literature has focused on quantifying biases exhibited by trained DL models, demographically targeted adversarial bias attacks on DL models and its implication in the clinical environment remains an underexplored field of research in medical imaging. In this work, we demonstrate that demographically targeted label poisoning attacks can introduce adversarial underdiagnosis bias in DL models and degrade performance on underrepresented groups without impacting overall model performance. Moreover, our results across multiple performance metrics and demographic groups like sex, age, and their intersectional subgroups indicate that a group's vulnerability to undetectable adversarial bias attacks is directly correlated with its representation in the model's training data.
- South America > Peru > Lima Department > Lima Province > Lima (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Sex-based Disparities in Brain Aging: A Focus on Parkinson's Disease
Beheshti, Iman, Booth, Samuel, Ko, Ji Hyun
PD is linked to faster brain aging. Sex is recognized as an important factor in PD, such that males are twice as likely as females to have the disease and have more severe symptoms and a faster progression rate. Despite previous research, there remains a significant gap in understanding the function of sex in the process of brain aging in PD patients. The T1-weighted MRI-driven brain-predicted age difference was computed in a group of 373 PD patients from the PPMI database using a robust brain-age estimation framework that was trained on 949 healthy subjects. Linear regression models were used to investigate the association between brain-PAD and clinical variables in PD, stratified by sex. All female PD patients were used in the correlational analysis while the same number of males were selected based on propensity score matching method considering age, education level, age of symptom onset, and clinical symptom severity. Despite both patient groups being matched for demographics, motor and non-motor symptoms, it was observed that males with Parkinson's disease exhibited a significantly higher mean brain age-delta than their female counterparts . In the propensity score-matched PD male group, brain-PAD was found to be associated with a decline in general cognition, a worse degree of sleep behavior disorder, reduced visuospatial acuity, and caudate atrophy. Conversely, no significant links were observed between these factors and brain-PAD in the PD female group.
- North America > Canada > Quebec > Montreal (0.05)
- North America > Canada > Manitoba > Winnipeg Metropolitan Region > Winnipeg (0.04)
- North America > United States > Massachusetts > Middlesex County > Natick (0.04)
- Europe > Poland (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (1.00)
- Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
An AI model to predict kidney damage, trained on data from veterans, works less well in women
The study was a page-turner: Researchers at Google showed that an artificial intelligence system could predict acute kidney injury, a common killer of hospitalized patients, up to 48 hours in advance. The results were so promising that the Department of Veterans Affairs, which supplied de-identified patient data to help build the AI, said in 2019 that it would immediately start work to bring it to the bedside. But a new study shows how treacherous that journey can be. Researchers found that a replica of the AI system, trained on a predominantly male population of veterans, does not perform nearly as well on women. Their study, published recently in the journal Nature, reports that a model built to approximate Google's AI overestimated the risk for women in certain circumstances and was less accurate in predicting the condition for women overall. "If we have this problem, then half the population won't benefit," said Jie Cao, a Ph.D. student at the University of Michigan and the lead author of the paper.
- North America > United States > Michigan (0.33)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- Health & Medicine > Therapeutic Area > Nephrology (1.00)
- Government > Military (1.00)
- Government > Regional Government > North America Government > United States Government (0.37)
AI could reduce gaps in heart attack care for women - Futurity
You are free to share this article under the Attribution 4.0 International license. Researchers have developed a new artificial-intelligence-based risk score that improves personalized care for female patients with heart attacks. Heart attacks are one of the leading causes of death worldwide, and women who suffer a heart attack have a higher mortality rate than men. This has been a matter of concern to cardiologists for decades and has led to controversy in the medical field about the causes and effects of possible gaps in treatment. The problem starts with the symptoms: unlike men, who usually experience chest pain with radiation to the left arm, a heart attack in women often manifests as abdominal pain radiating to the back or as nausea and vomiting.