Who Does the Giant Number Pile Like Best: Analyzing Fairness in Hiring Contexts

Seshadri, Preethi, Goldfarb-Tarrant, Seraphina

Jan-8-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly being deployed in high-stakes applications like hiring, yet their potential for unfair decision-making and outcomes remains understudied, particularly in generative settings. In this work, we examine the fairness of LLM-based hiring systems through two real-world tasks: resume summarization and retrieval. By constructing a synthetic resume dataset and curating job postings, we investigate whether model behavior differs across demographic groups and is sensitive to demographic perturbations. Our findings reveal that race-based differences appear in approximately 10% of generated summaries, while gender-based differences occur in only 1%. In the retrieval setting, all evaluated models display non-uniform selection patterns across demographic groups and exhibit high sensitivity to both gender and race-based perturbations. Surprisingly, retrieval models demonstrate comparable sensitivity to non-demographic changes, suggesting that fairness issues may stem, in part, from general brittleness issues. Overall, our results indicate that LLM-based hiring systems, especially at the retrieval stage, can exhibit notable biases that lead to discriminatory outcomes in real-world contexts.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jan-8-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.68)
- Asia (0.67)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Education (0.68)
- Law > Civil Rights & Constitutional Law (0.34)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)