Evaluating the quality of published medical research with ChatGPT
Thelwall, Mike, Jiang, Xiaorui, Bath, Peter A.
–arXiv.org Artificial Intelligence
Research quality evaluation is important for departmental evaluations and academic career decisions. Unfortunately, the evaluators may not have time to fully read the work assessed and may instead rely on the reputation or Journal Impact Factor of the publishing journals, on the citation counts for individual articles, or on the reputation or career citations of the author. Whilst journal-based evidence is not optimal (Waltman & Traag, 2021), the main article-level indicator, citation counts, only directly reflects the scholarly impact of work and not its rigour, originality, and societal impacts (Aksnes, et al., 2019), all of which are relevant quality dimensions (Langfeldt et al., 2020). Moreover, article citation counts are ineffective for newer articles (Wang, 2013). In response, attempts to use Large Language Models (LLMs) to evaluate the quality of academic work have shown that ChatGPT quality scores are at least as effective as citation counts in most fields and substantially better in a few (Thelwall & Yaghi, 2024). Medicine is an exception, however, with ChatGPT research quality scores having a small negative correlation with the mean scores of the submitting department in the Research Excellence Framework (REF) Clinical Medicine Unit of Assessment (UoA) (Thelwall, 2024ab; Thelwall & Yaghi, 2024).
arXiv.org Artificial Intelligence
Nov-4-2024
- Country:
- Europe > United Kingdom
- England
- Leicestershire > Leicester (0.05)
- South Yorkshire > Sheffield (0.04)
- England
- North America > United States
- California > Marin County
- San Rafael (0.04)
- New York > New York County
- New York City (0.04)
- California > Marin County
- Europe > United Kingdom
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Strength High (1.00)
- Research Report
- Industry:
- Technology: