Can AI Solve the Peer Review Crisis? A Large Scale Experiment on LLM's Performance and Biases in Evaluating Economics Papers
Pataranutaporn, Pat, Powdthavee, Nattavudh, Maes, Pattie
–arXiv.org Artificial Intelligence
We investigate whether artificial intelligence can address the peer review crisis in economics by analyzing 27,090 evaluations of 9,030 unique submissions using a large language model (LLM). The experiment systematically varies author characteristics (e.g., affiliation, reputation, gender) and publication quality (e.g., top-tier, mid-tier, low-tier, AI generated papers). The results indicate that LLMs effectively distinguish paper quality but exhibit biases favoring prominent institutions, male authors, and renowned economists. Additionally, LLMs struggle to differentiate high-quality AI-generated papers from genuine top-tier submissions. While LLMs offer efficiency gains, their susceptibility to bias necessitates cautious integration and hybrid peer review models to balance equity and accuracy.
arXiv.org Artificial Intelligence
Jan-30-2025
- Country:
- Africa (1.00)
- Asia (1.00)
- Europe > Germany
- North Rhine-Westphalia (0.14)
- North America > United States
- Massachusetts (0.14)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Strength High (0.67)
- Research Report
- Industry:
- Banking & Finance
- Education (0.67)
- Government > Regional Government
- Health & Medicine > Therapeutic Area (1.00)
- Law (1.00)
- Technology: