Can AI Solve the Peer Review Crisis? A Large Scale Experiment on LLM's Performance and Biases in Evaluating Economics Papers
Pataranutaporn, Pat, Powdthavee, Nattavudh, Maes, Pattie
–arXiv.org Artificial Intelligence
We investigate whether artificial intelligence can address the peer review crisis in economics by analyzing 27,090 evaluations of 9,030 unique submissions using a large language model (LLM). The experiment systematically varies author characteristics (e.g., affiliation, reputation, gender) and publication quality (e.g., top-tier, mid-tier, low-tier, AI generated papers). The results indicate that LLMs effectively distinguish paper quality but exhibit biases favoring prominent institutions, male authors, and renowned economists. Additionally, LLMs struggle to differentiate high-quality AI-generated papers from genuine top-tier submissions. While LLMs offer efficiency gains, their susceptibility to bias necessitates cautious integration and hybrid peer review models to balance equity and accuracy.
arXiv.org Artificial Intelligence
Jan-30-2025
- Country:
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Europe
- Portugal (0.04)
- Italy (0.04)
- Germany > North Rhine-Westphalia
- Cologne Region > Bonn (0.04)
- Asia
- Malaysia (0.14)
- Singapore (0.04)
- Thailand (0.04)
- Indonesia (0.04)
- Bangladesh (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Africa
- The Gambia (0.04)
- Nigeria (0.04)
- Liberia (0.04)
- Ghana (0.04)
- South Africa > Western Cape
- Cape Town (0.05)
- North America > United States
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (1.00)
- Strength High (0.67)
- Research Report
- Industry:
- Law (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Education (0.67)
- Government > Regional Government
- Banking & Finance
- Technology: