Goto

Collaborating Authors

 peer-review process


Identity Theft in AI Conference Peer Review

Communications of the ACM

Academia heavily relies on trust. This trust-based system, however, creates a significant vulnerability: identity theft. In this Opinion column, we describe newly uncovered cases of identity theft within the scientific peer-review process within the research area of artificial intelligence (AI), involving modus operandi that could also disrupt other academic procedures. We begin by outlining the peer-review process, focusing on scientific conferences since they are the most prominent venues of publication in computer science. Peer review is foundational to scientific inquiry, relying on researchers to voluntarily apply their expertise in evaluating scientific papers.


Identity Theft in AI Conference Peer Review

Shah, Nihar B., Bok, Melisa, Liu, Xukun, McCallum, Andrew

arXiv.org Artificial Intelligence

Abstract: We discuss newly uncovered cases of identity theft in the scientific peer-review process within artificial intelligence (AI) research, with broader implications for other academic procedures. We detail how dishonest researchers exploit the peer-review system by creating fraudulent reviewer profiles to manipulate paper evaluations, leveraging weaknesses in reviewer recruitment workflows and identity verification processes. The findings highlight the critical need for stronger safeguards against identity theft in peer review and academia at large, and to this end, we also propose mitigating strategies. Academia heavily relies on trust. This trust-based system, however, creates a significant vulnerability: identity theft.


Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review

Li, Zhuochun, Ji, Yuelyu, Meng, Rui, He, Daqing

arXiv.org Artificial Intelligence

While reasoning capabilities typically emerge in large language models (LLMs) with tens of billions of parameters, recent research focuses on improving smaller open-source models through knowledge distillation (KD) from commercial LLMs. However, many of these studies rely solely on responses from a single LLM as the gold rationale, unlike the natural human learning process, which involves understanding both the correct answers and the reasons behind mistakes. In this paper, we introduce a novel Fault-Aware Distillation via Peer-Review (FAIR) approach: 1) Instead of merely obtaining gold rationales from teachers, our method asks teachers to identify and explain the student's mistakes, providing customized instruction learning data. 2) We design a simulated peer-review process between teacher LLMs, which selects only the generated rationales above the acceptance threshold. This reduces the chance of teachers guessing correctly with flawed rationale, improving instructional data quality. Comprehensive experiments and analysis on mathematical, commonsense, and logical reasoning tasks demonstrate the effectiveness of our method.


Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions

Tan, Cheng, Lyu, Dongxin, Li, Siyuan, Gao, Zhangyang, Wei, Jingxuan, Ma, Siqi, Liu, Zicheng, Li, Stan Z.

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers. We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources, including the top-tier conference and prestigious journal. This dataset is meticulously designed to facilitate the applications of LLMs for multi-turn dialogues, effectively simulating the complete peer-review process. Furthermore, we propose a series of metrics to evaluate the performance of LLMs for each role under this reformulated peer-review setting, ensuring fair and comprehensive evaluations. We believe this work provides a promising perspective on enhancing the LLM-driven peer-review process by incorporating dynamic, role-based interactions. It aligns closely with the iterative and interactive nature of real-world academic peer review, offering a robust foundation for future research and development in this area. We open-source the dataset at https://github.com/chengtan9907/ReviewMT.


Unveiling the Sentinels: Assessing AI Performance in Cybersecurity Peer Review

Niu, Liang, Xue, Nian, Pöpper, Christina

arXiv.org Artificial Intelligence

Peer review is the method employed by the scientific community for evaluating research advancements. In the field of cybersecurity, the practice of double-blind peer review is the de-facto standard. This paper touches on the holy grail of peer reviewing and aims to shed light on the performance of AI in reviewing for academic security conferences. Specifically, we investigate the predictability of reviewing outcomes by comparing the results obtained from human reviewers and machine-learning models. To facilitate our study, we construct a comprehensive dataset by collecting thousands of papers from renowned computer science conferences and the arXiv preprint website. Based on the collected data, we evaluate the prediction capabilities of ChatGPT and a two-stage classification approach based on the Doc2Vec model with various classifiers. Our experimental evaluation of review outcome prediction using the Doc2Vec-based approach performs significantly better than the ChatGPT and achieves an accuracy of over 90%. While analyzing the experimental results, we identify the potential advantages and limitations of the tested ML models. We explore areas within the paper-reviewing process that can benefit from automated support approaches, while also recognizing the irreplaceable role of human intellect in certain aspects that cannot be matched by state-of-the-art AI techniques.


GPT4 is Slightly Helpful for Peer-Review Assistance: A Pilot Study

Robertson, Zachary

arXiv.org Artificial Intelligence

In this pilot study, we investigate the use of GPT4 to assist in the peer-review process. Our key hypothesis was that GPT-generated reviews could achieve comparable helpfulness to human reviewers. By comparing reviews generated by both human reviewers and GPT models for academic papers submitted to a major machine learning conference, we provide initial evidence that artificial intelligence can contribute effectively to the peer-review process. We also perform robustness experiments with inserted errors to understand which parts of the paper the model tends to focus on. Our findings open new avenues for leveraging machine learning tools to address resource constraints in peer review. The results also shed light on potential enhancements to the review process and lay the groundwork for further research on scaling oversight in a domain where human-feedback is increasingly a scarce resource.


How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?

Rastogi, Charvi, Stelmakh, Ivan, Beygelzimer, Alina, Dauphin, Yann N., Liang, Percy, Vaughan, Jennifer Wortman, Xue, Zhenyu, Daumé, Hal III, Pierson, Emma, Shah, Nihar B.

arXiv.org Artificial Intelligence

How do author perceptions match up to the outcomes of the peer-review process and perceptions of others? In a top-tier computer science conference (NeurIPS 2021) with more than 23,000 submitting authors and 9,000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based on scientific contribution, and (iii) the change in their perception about their own papers after seeing the reviews. The salient results are: (1) Authors have roughly a three-fold overestimate of the acceptance probability of their papers: The median prediction is 70% for an approximately 25% acceptance rate. (2) Female authors exhibit a marginally higher (statistically significant) miscalibration than male authors; predictions of authors invited to serve as meta-reviewers or reviewers are similarly calibrated, but better than authors who were not invited to review. (3) Authors' relative ranking of scientific contribution of two submissions they made generally agree (93%) with their predicted acceptance probabilities, but there is a notable 7% responses where authors think their better paper will face a worse outcome. (4) The author-provided rankings disagreed with the peer-review decisions about a third of the time; when co-authors ranked their jointly authored papers, co-authors disagreed at a similar rate -- about a third of the time. (5) At least 30% of respondents of both accepted and rejected papers said that their perception of their own paper improved after the review process. The stakeholders in peer review should take these findings into account in setting their expectations from peer review.


Should AI have a role in assessing research quality?

#artificialintelligence

CERN, Europe's particle-physics laboratory, produces vast amounts of data, which are stored at its computer centre (pictured) and analysed with the help of artifical intelligence (AI). UK funders want to know whether AI could also assist in peer reviewing thousands of research outputs for nationwide quality audits.Credit: Dean Mouhtaropoulos/Getty Efforts to ease the workloads of peer reviewers by using artificial intelligence (AI) are gathering pace -- with one country's main research-evaluation exercise actively looking into ways of harnessing the technology. A study commissioned by the United Kingdom's main public research-funding bodies is examining how algorithms can assist in conducting peer review on journal articles submitted to the UK's Research Excellence Framework (REF). The REF, a national quality audit that measures the impact of research carried out at UK higher-education institutions, is a huge undertaking. In the latest iteration, the results of which were published in May 2022, more than 185,000 research outputs were evaluated from more than 76,000 academics based at 157 UK institutions.


Challenges, Experiments, and Computational Solutions in Peer Review

Communications of the ACM

While researchers are trained to do research, there is little training for peer review. Several initiatives and experiments have looked to address this challenge. Recently, the ICML 2020 conference adopted a method to select and then mentor junior reviewers, who would not have been asked to review otherwise, with a motivation of expanding the reviewer pool to address the large volume of submissions.43 An analysis of their reviews revealed that the junior reviewers were more engaged through various stages of the process as compared to conventional reviewers. Moreover, the conference asked meta reviewers to rate all reviews, and 30% of reviews written by junior reviewers received the highest rating by meta reviewers, in contrast to 14% for the main pool. Training reviewers at the beginning of their careers is a good start but may not be enough. There is some evidence8 that quality of an individual's review falls over time, at a slow but steady rate, possibly because of increasing time constraints or in reaction to poor-quality reviews they themselves receive. While researchers are trained to do research, there is little training for peer review … Training reviewers at the beginning of their careers is a good start but may not be enough.


Institutionalising Ethics in AI through Broader Impact Requirements

Prunkl, Carina, Ashurst, Carolyn, Anderljung, Markus, Webb, Helena, Leike, Jan, Dafoe, Allan

arXiv.org Artificial Intelligence

Turning principles into practice is one of the most pressing challenges of artificial intelligence (AI) governance. In this article, we reflect on a novel governance initiative by one of the world's largest AI conferences. In 2020, the Conference on Neural Information Processing Systems (NeurIPS) introduced a requirement for submitting authors to include a statement on the broader societal impacts of their research. Drawing insights from similar governance initiatives, including institutional review boards (IRBs) and impact requirements for funding applications, we investigate the risks, challenges and potential benefits of such an initiative. Among the challenges, we list a lack of recognised best practice and procedural transparency, researcher opportunity costs, institutional and social pressures, cognitive biases, and the inherently difficult nature of the task. The potential benefits, on the other hand, include improved anticipation and identification of impacts, better communication with policy and governance experts, and a general strengthening of the norms around responsible research. To maximise the chance of success, we recommend measures to increase transparency, improve guidance, create incentives to engage earnestly with the process, and facilitate public deliberation on the requirement's merits and future. Perhaps the most important contribution from this analysis are the insights we can gain regarding effective community-based governance and the role and responsibility of the AI research community more broadly.