disability bias
AccessEval: Benchmarking Disability Bias in Large Language Models
Panda, Srikant, Agarwal, Amit, Patel, Hitesh Laxmichand
Large Language Models (LLMs) are increasingly deployed across diverse domains but often exhibit disparities in how they handle real-life queries. To systematically investigate these effects within various disability contexts, we introduce \textbf{AccessEval (Accessibility Evaluation)}, a benchmark evaluating 21 closed- and open-source LLMs across 6 real-world domains and 9 disability types using paired Neutral and Disability-Aware Queries. We evaluated model outputs with metrics for sentiment, social perception, and factual accuracy. Our analysis reveals that responses to disability-aware queries tend to have a more negative tone, increased stereotyping, and higher factual error compared to neutral queries. These effects show notable variation by domain and disability type, with disabilities affecting hearing, speech, and mobility disproportionately impacted. These disparities reflect persistent forms of ableism embedded in model behavior. By examining model performance in real-world decision-making contexts, we better illuminate how such biases can translate into tangible harms for disabled users. This framing helps bridges the gap between technical evaluation and user impact, reinforcing importance of bias mitigation in day-to-day applications. Our dataset is publicly available at: https://huggingface.co/datasets/Srikant86/AccessEval
Who Gets Left Behind? Auditing Disability Inclusivity in Large Language Models
Dash, Deepika, Bangera, Yeshil, Bangera, Mithil, Vadithya, Gouthami, Panda, Srikant
Large Language Models (LLMs) are increasingly used for accessibility guidance, yet many disability groups remain underserved by their advice. To address this gap, we present taxonomy aligned benchmark1 of human validated, general purpose accessibility questions, designed to systematically audit inclusivity across disabilities. Our benchmark evaluates models along three dimensions: Question-Level Coverage (breadth within answers), Disability-Level Coverage (balance across nine disability categories), and Depth (specificity of support). Applying this framework to 17 proprietary and open-weight models reveals persistent inclusivity gaps: Vision, Hearing, and Mobility are frequently addressed, while Speech, Genetic/Developmental, Sensory-Cognitive, and Mental Health remain under served. Depth is similarly concentrated in a few categories but sparse elsewhere. These findings reveal who gets left behind in current LLM accessibility guidance and highlight actionable levers: taxonomy-aware prompting/training and evaluations that jointly audit breadth, balance, and depth.
Identifying and Improving Disability Bias in GAI-Based Resume Screening
Glazko, Kate, Mohammed, Yusuf, Kosa, Ben, Potluri, Venkatesh, Mankoff, Jennifer
As Generative AI rises in adoption, its use has expanded to include domains such as hiring and recruiting. However, without examining the potential of bias, this may negatively impact marginalized populations, including people with disabilities. To address this important concern, we present a resume audit study, in which we ask ChatGPT (specifically, GPT-4) to rank a resume against the same resume enhanced with an additional leadership award, scholarship, panel presentation, and membership that are disability related. We find that GPT-4 exhibits prejudice towards these enhanced CVs. Further, we show that this prejudice can be quantifiably reduced by training a custom GPTs on principles of DEI and disability justice. Our study also includes a unique qualitative analysis of the types of direct and indirect ableism GPT-4 uses to justify its biased decisions and suggest directions for additional bias mitigation work. Additionally, since these justifications are presumably drawn from training data containing real-world biased statements made by humans, our analysis suggests additional avenues for understanding and addressing human bias.
A Democratic Platform for Engaging with Disabled Community in Generative AI Development
Artificial Intelligence (AI) systems, especially generative AI technologies are becoming more relevant in our society. Tools like ChatGPT are being used by members of the disabled community e.g., Autistic people may use it to help compose emails. The growing impact and popularity of generative AI tools have prompted us to examine their relevance within the disabled community. The design and development phases often neglect this marginalized group, leading to inaccurate predictions and unfair discrimination directed towards them. This could result from bias in data sets, algorithms, and systems at various phases of creation and implementation. This workshop paper proposes a platform to involve the disabled community while building generative AI systems. With this platform, our aim is to gain insight into the factors that contribute to bias in the outputs generated by generative AI when used by the disabled community. Furthermore, we expect to comprehend which algorithmic factors are the main contributors to the output's incorrectness or irrelevancy. The proposed platform calls on both disabled and non-disabled people from various geographical and cultural backgrounds to collaborate asynchronously and remotely in a democratic approach to decision-making.
Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models
Venkit, Pranav Narayanan, Srinath, Mukund, Wilson, Shomir
We analyze sentiment analysis and toxicity detection models to detect the presence of explicit bias against people with disability (PWD). We employ the bias identification framework of Perturbation Sensitivity Analysis to examine conversations related to PWD on social media platforms, specifically Twitter and Reddit, in order to gain insight into how disability bias is disseminated in real-world social settings. We then create the \textit{Bias Identification Test in Sentiment} (BITS) corpus to quantify explicit disability bias in any sentiment analysis and toxicity detection models. Our study utilizes BITS to uncover significant biases in four open AIaaS (AI as a Service) sentiment analysis tools, namely TextBlob, VADER, Google Cloud Natural Language API, DistilBERT and two toxicity detection models, namely two versions of Toxic-BERT. Our findings indicate that all of these models exhibit statistically significant explicit bias against PWD.
Areas of Strategic Visibility: Disability Bias in Biometrics
Mankoff, Jennifer, Kasnitz, Devva, Studies, Disability, Camp, L Jean, Lazar, Jonathan, Hochheiser, Harry
Yet many of these systems are not accessible to people who experience different kinds of disability exclusion. Different personal characteristics may impact any or all of the physical (DNA, fingerprints, face or retina) and behavioral (gesture, gait, voice) characteristics listed in the RFI as examples of biometric signals. We define disability here in terms of the discriminatory and often systemic problems with available infrastructure's ability to meet the needs of all people [UN 2017, Oliver, 2013). Using this definition, "[biometrics] could either mitigate or amplify disability depending on how they are designed." (Guo, 2019). As Whittaker and colleauges (2019) state, this is not simply a matter of algorithmic accuracy: "...discrimination against people of color, women, and other historically marginalized groups has often been justified by representing these groups as disabled . Thus disability is entwined with, and serves to justify, practices of marginalization." It is critical that we look beyond inclusion to full and fully accommodated participation.
Disability Bias in AI Hiring Tools Targeted in US Guidance (1)
Employers have a responsibility to inspect artificial intelligence tools for disability bias and should have plans to provide reasonable accommodations, the Equal Employment Opportunity Commission and Justice Department said in guidance documents. The guidance released Thursday is the first from the federal government on the use of AI hiring tools that focuses on their impact on people with disabilities. The guidance also seeks to inform workers of their right to inquire about a company's use of AI and to request accommodations, the agencies said. "Today we are sounding an alarm regarding the dangers of blind reliance on AI and other technologies that are increasingly used by employers," Assistant Attorney General Kristen Clarke told reporters. The DOJ enforces disability discrimination laws with respect to state and local government employers, while the EEOC enforces such laws in the private sector and federal employers.
Addressing Disability Bias In Artificial Intelligence
Unfortunately, bias in AI against individuals with physical disabilities is prevalent in today's digital society. Dealing with this problem requires AI researchers and developers to show basic common sense and, above all, empathy for disabled system users. The question is, how can we develop AI systems that work without bias for all users? As we know, bias in AI is, quite sadly, a reality of our times. There have been numerous instances in which recommendations and other outputs from AI-powered systems have had distinctively racist undertones.