Surgeons Are Indian Males and Speech Therapists Are White Females: Auditing Biases in Vision-Language Models for Healthcare Professionals

Siddiqui, Zohaib Hasan, Nadeem, Dayam, Rahman, Mohammad Masudur, Nadeem, Mohammad, Sohail, Shahab Saquib, Chaudhry, Beenish Moalla

arXiv.org Artificial Intelligence 

Abstract--Vision language models (VLMs), such as CLIP and OpenCLIP, can encode and reflect stereotypical associations between medical professions and demographic attributes learned from web-scale data. We present an evaluation protocol for healthcare settings that quantifies associated biases and assesses their operational risk. Our methodology (i) defines a taxonomy spanning clinicians and allied healthcare roles (e.g., surgeon, cardiologist, dentist, nurse, pharmacist, technician), (ii) curates a profession-aware prompt suite to probe model behavior, and (iii) benchmarks demographic skew against a balanced face corpus. Empirically, we observe consistent demographic biases across multiple roles and vision models. Our work highlights the importance of bias identification in critical domains such as healthcare as AI-enabled hiring and workforce analytics can have downstream implications for equity, compliance, and patient trust. Vision language models (VLMs) constitute a class of AI architectures that learn joint representation by aligning visual perception with natural language semantics [1]. Typically, an image encoder is paired with a text encoder and trained to inhabit a shared embedding space that supports cross-modal correspondence between images and linguistic descriptions. One such instance is OpenAI's CLIP (Contrastive Language Image Pretraining) which is optimized on roughly 400 million image-text pairs and exhibits strong zero-shot ability for The code can be found at https://github.com/zohaibhasan066/ VLMs enable a broad spectrum of multimodal functionalities, including image captioning, visual question answering, and bidirectional text-image retrieval with downstream applications in search, recommendation, and human-computer interaction.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found