Who Does the Giant Number Pile Like Best: Analyzing Fairness in Hiring Contexts
Seshadri, Preethi, Goldfarb-Tarrant, Seraphina
–arXiv.org Artificial Intelligence
Large language models (LLMs) are increasingly being deployed in high-stakes applications like hiring, yet their potential for unfair decision-making and outcomes remains understudied, particularly in generative settings. In this work, we examine the fairness of LLM-based hiring systems through two real-world tasks: resume summarization and retrieval. By constructing a synthetic resume dataset and curating job postings, we investigate whether model behavior differs across demographic groups and is sensitive to demographic perturbations. Our findings reveal that race-based differences appear in approximately 10% of generated summaries, while gender-based differences occur in only 1%. In the retrieval setting, all evaluated models display non-uniform selection patterns across demographic groups and exhibit high sensitivity to both gender and race-based perturbations. Surprisingly, retrieval models demonstrate comparable sensitivity to non-demographic changes, suggesting that fairness issues may stem, in part, from general brittleness issues. Overall, our results indicate that LLM-based hiring systems, especially at the retrieval stage, can exhibit notable biases that lead to discriminatory outcomes in real-world contexts.
arXiv.org Artificial Intelligence
Jan-8-2025
- Country:
- Africa > Kenya (0.04)
- Asia
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- United Kingdom (0.04)
- Ireland > Leinster
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Louisiana (0.04)
- New York > New York County
- New York City (0.05)
- North Carolina (0.04)
- Florida > Miami-Dade County
- Canada > Ontario
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Education (0.68)
- Law > Civil Rights & Constitutional Law (0.34)
- Technology: