JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models
Wang, Ze, Wu, Zekun, Guan, Xin, Thaler, Michael, Koshiyama, Adriano, Lu, Skylar, Beepath, Sachin, Ertekin, Ediz Jr., Perez-Ortiz, Maria
–arXiv.org Artificial Intelligence
This paper presents a novel framework for benchmarking hierarchical gender hiring bias in Large Language Models (LLMs) for resume scoring, revealing significant issues of reverse bias and overdebiasing. Our contributions are fourfold: First, we introduce a framework using a real, anonymized resume dataset from the Healthcare, Finance, and Construction industries, meticulously used to avoid confounding factors. It evaluates gender hiring biases across hierarchical levels, including Level bias, Spread bias, Taste-based bias, and Statistical bias. This framework can be generalized to other social traits and tasks easily. Second, we propose novel statistical and computational hiring bias metrics based on a counterfactual approach, including Rank After Scoring (RAS), Rank-based Impact Ratio, Permutation Test-Based Metrics, and Fixed Effects Model-based Metrics. These metrics, rooted in labor economics, NLP, and law, enable holistic evaluation of hiring biases. Third, we analyze hiring biases in ten state-of-the-art LLMs. Six out of ten LLMs show significant biases against males in healthcare and finance. An industry-effect regression reveals that the healthcare industry is the most biased against males. GPT-4o and GPT-3.5 are the most biased models, showing significant bias in all three industries. Conversely, Gemini-1.5-Pro, Llama3-8b-Instruct, and Llama3-70b-Instruct are the least biased. The hiring bias of all LLMs, except for Llama3-8b-Instruct and Claude-3-Sonnet, remains consistent regardless of random expansion or reduction of resume content. Finally, we offer a user-friendly demo to facilitate adoption and practical application of the framework.
arXiv.org Artificial Intelligence
Jun-17-2024
- Country:
- North America > United States
- District of Columbia > Washington (0.04)
- New York > New York County
- New York City (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Illinois > Cook County
- Chicago (0.04)
- California > Alameda County
- Berkeley (0.04)
- North America > United States
- Genre:
- Research Report > Experimental Study (0.95)
- Industry:
- Law (1.00)
- Health & Medicine (1.00)
- Information Technology (0.67)
- Government (0.67)
- Technology: