Application of CARE-SD text classifier tools to assess distribution of stigmatizing and doubt-marking language features in EHR

Walker, Drew, Love, Jennifer, Rajwal, Swati, Walker, Isabel C, Cooper, Hannah LF, Sarker, Abeed, Livingston, Melvin III

Jul-15-2025–arXiv.org Artificial Intelligence

Introduction: Electronic health records (EHR) are a critical medium through which patient stigmatization is perpetuated among healthcare teams. Methods: We identified linguistic features of doubt markers and stigmatizing labels in MIMIC-III EHR via expanded lexicon matching and supervised learning classifiers. Predictors of rates of linguistic features were assessed using Poisson regression models. Results: We found higher rates of stigmatizing labels per chart among patients who were Black or African American (RR: 1.16), patients with Medicare/Medicaid or government-run insurance (RR: 2.46), self-pay (RR: 2.12), and patients with a variety of stigmatizing disease and mental health conditions. Patterns among doubt markers were similar, though male patients had higher rates of doubt markers (RR: 1.25). We found increased stigmatizing labels used by nurses (RR: 1.40), and social workers (RR: 2.25), with similar patterns of doubt markers. Discussion: Stigmatizing language occurred at higher rates among historically stigmatized patients, perpetuated by multiple provider types.

doubt marker, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jul-15-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.93)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Health Care Technology > Medical Record (1.00)
  - Health Care Providers & Services (1.00)
  - Consumer Health (1.00)
  - Therapeutic Area > Psychiatry/Psychology
    - Addiction Disorder (1.00)
- Government > Regional Government
  - North America Government > United States Government (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Statistical Learning
    - Regression (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found