Can AI Solve the Peer Review Crisis? A Large Scale Experiment on LLM's Performance and Biases in Evaluating Economics Papers

Pataranutaporn, Pat, Powdthavee, Nattavudh, Maes, Pattie

Jan-30-2025–arXiv.org Artificial Intelligence

We investigate whether artificial intelligence can address the peer review crisis in economics by analyzing 27,090 evaluations of 9,030 unique submissions using a large language model (LLM). The experiment systematically varies author characteristics (e.g., affiliation, reputation, gender) and publication quality (e.g., top-tier, mid-tier, low-tier, AI generated papers). The results indicate that LLMs effectively distinguish paper quality but exhibit biases favoring prominent institutions, male authors, and renowned economists. Additionally, LLMs struggle to differentiate high-quality AI-generated papers from genuine top-tier submissions. While LLMs offer efficiency gains, their susceptibility to bias necessitates cautious integration and hybrid peer review models to balance equity and accuracy.

large language model, natural language, submission, (19 more...)

arXiv.org Artificial Intelligence

Jan-30-2025

arXiv.org PDF

Add feedback

Country:
- Africa (1.00)
- Asia (1.00)
- Europe > Germany
  - North Rhine-Westphalia (0.14)
- North America > United States
  - Massachusetts (0.14)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (1.00)
  - Strength High (0.67)

Industry:
- Banking & Finance
  - Economy (1.00)
  - Trading (1.00)
- Education (0.67)
- Government > Regional Government
  - North America Government > United States Government (0.45)
- Health & Medicine > Therapeutic Area (1.00)
- Law (1.00)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)