peer-review process
Identity Theft in AI Conference Peer Review
Academia heavily relies on trust. This trust-based system, however, creates a significant vulnerability: identity theft. In this Opinion column, we describe newly uncovered cases of identity theft within the scientific peer-review process within the research area of artificial intelligence (AI), involving modus operandi that could also disrupt other academic procedures. We begin by outlining the peer-review process, focusing on scientific conferences since they are the most prominent venues of publication in computer science. Peer review is foundational to scientific inquiry, relying on researchers to voluntarily apply their expertise in evaluating scientific papers.
- Law Enforcement & Public Safety > Fraud (0.83)
- Information Technology > Security & Privacy (0.83)
Identity Theft in AI Conference Peer Review
Shah, Nihar B., Bok, Melisa, Liu, Xukun, McCallum, Andrew
Abstract: We discuss newly uncovered cases of identity theft in the scientific peer-review process within artificial intelligence (AI) research, with broader implications for other academic procedures. We detail how dishonest researchers exploit the peer-review system by creating fraudulent reviewer profiles to manipulate paper evaluations, leveraging weaknesses in reviewer recruitment workflows and identity verification processes. The findings highlight the critical need for stronger safeguards against identity theft in peer review and academia at large, and to this end, we also propose mitigating strategies. Academia heavily relies on trust. This trust-based system, however, creates a significant vulnerability: identity theft.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Massachusetts (0.04)
- Law Enforcement & Public Safety > Fraud (1.00)
- Information Technology > Security & Privacy (1.00)
Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review
Li, Zhuochun, Ji, Yuelyu, Meng, Rui, He, Daqing
While reasoning capabilities typically emerge in large language models (LLMs) with tens of billions of parameters, recent research focuses on improving smaller open-source models through knowledge distillation (KD) from commercial LLMs. However, many of these studies rely solely on responses from a single LLM as the gold rationale, unlike the natural human learning process, which involves understanding both the correct answers and the reasons behind mistakes. In this paper, we introduce a novel Fault-Aware Distillation via Peer-Review (FAIR) approach: 1) Instead of merely obtaining gold rationales from teachers, our method asks teachers to identify and explain the student's mistakes, providing customized instruction learning data. 2) We design a simulated peer-review process between teacher LLMs, which selects only the generated rationales above the acceptance threshold. This reduces the chance of teachers guessing correctly with flawed rationale, improving instructional data quality. Comprehensive experiments and analysis on mathematical, commonsense, and logical reasoning tasks demonstrate the effectiveness of our method.
- Asia > India > Maharashtra > Mumbai (0.05)
- North America > United States (0.04)
- Asia > China > Heilongjiang Province > Daqing (0.04)
Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions
Tan, Cheng, Lyu, Dongxin, Li, Siyuan, Gao, Zhangyang, Wei, Jingxuan, Ma, Siqi, Liu, Zicheng, Li, Stan Z.
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields and have shown significant potential in the academic peer-review process. However, existing applications are primarily limited to static review generation based on submitted papers, which fail to capture the dynamic and iterative nature of real-world peer reviews. In this paper, we reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers. We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources, including the top-tier conference and prestigious journal. This dataset is meticulously designed to facilitate the applications of LLMs for multi-turn dialogues, effectively simulating the complete peer-review process. Furthermore, we propose a series of metrics to evaluate the performance of LLMs for each role under this reformulated peer-review setting, ensuring fair and comprehensive evaluations. We believe this work provides a promising perspective on enhancing the LLM-driven peer-review process by incorporating dynamic, role-based interactions. It aligns closely with the iterative and interactive nature of real-world academic peer review, offering a robust foundation for future research and development in this area. We open-source the dataset at https://github.com/chengtan9907/ReviewMT.
- Research Report (1.00)
- Overview (0.93)
- Health & Medicine (1.00)
- Education > Educational Setting > Online (0.46)
Unveiling the Sentinels: Assessing AI Performance in Cybersecurity Peer Review
Niu, Liang, Xue, Nian, Pöpper, Christina
Peer review is the method employed by the scientific community for evaluating research advancements. In the field of cybersecurity, the practice of double-blind peer review is the de-facto standard. This paper touches on the holy grail of peer reviewing and aims to shed light on the performance of AI in reviewing for academic security conferences. Specifically, we investigate the predictability of reviewing outcomes by comparing the results obtained from human reviewers and machine-learning models. To facilitate our study, we construct a comprehensive dataset by collecting thousands of papers from renowned computer science conferences and the arXiv preprint website. Based on the collected data, we evaluate the prediction capabilities of ChatGPT and a two-stage classification approach based on the Doc2Vec model with various classifiers. Our experimental evaluation of review outcome prediction using the Doc2Vec-based approach performs significantly better than the ChatGPT and achieves an accuracy of over 90%. While analyzing the experimental results, we identify the potential advantages and limitations of the tested ML models. We explore areas within the paper-reviewing process that can benefit from automated support approaches, while also recognizing the irreplaceable role of human intellect in certain aspects that cannot be matched by state-of-the-art AI techniques.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > New York (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.61)
- Education > Educational Technology > Educational Software (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
GPT4 is Slightly Helpful for Peer-Review Assistance: A Pilot Study
In this pilot study, we investigate the use of GPT4 to assist in the peer-review process. Our key hypothesis was that GPT-generated reviews could achieve comparable helpfulness to human reviewers. By comparing reviews generated by both human reviewers and GPT models for academic papers submitted to a major machine learning conference, we provide initial evidence that artificial intelligence can contribute effectively to the peer-review process. We also perform robustness experiments with inserted errors to understand which parts of the paper the model tends to focus on. Our findings open new avenues for leveraging machine learning tools to address resource constraints in peer review. The results also shed light on potential enhancements to the review process and lay the groundwork for further research on scaling oversight in a domain where human-feedback is increasingly a scarce resource.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Alberta > Census Division No. 13 > Lac Ste. Anne County (0.04)
- Asia (0.04)
How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?
Rastogi, Charvi, Stelmakh, Ivan, Beygelzimer, Alina, Dauphin, Yann N., Liang, Percy, Vaughan, Jennifer Wortman, Xue, Zhenyu, Daumé, Hal III, Pierson, Emma, Shah, Nihar B.
How do author perceptions match up to the outcomes of the peer-review process and perceptions of others? In a top-tier computer science conference (NeurIPS 2021) with more than 23,000 submitting authors and 9,000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based on scientific contribution, and (iii) the change in their perception about their own papers after seeing the reviews. The salient results are: (1) Authors have roughly a three-fold overestimate of the acceptance probability of their papers: The median prediction is 70% for an approximately 25% acceptance rate. (2) Female authors exhibit a marginally higher (statistically significant) miscalibration than male authors; predictions of authors invited to serve as meta-reviewers or reviewers are similarly calibrated, but better than authors who were not invited to review. (3) Authors' relative ranking of scientific contribution of two submissions they made generally agree (93%) with their predicted acceptance probabilities, but there is a notable 7% responses where authors think their better paper will face a worse outcome. (4) The author-provided rankings disagreed with the peer-review decisions about a third of the time; when co-authors ranked their jointly authored papers, co-authors disagreed at a similar rate -- about a third of the time. (5) At least 30% of respondents of both accepted and rejected papers said that their perception of their own paper improved after the review process. The stakeholders in peer review should take these findings into account in setting their expectations from peer review.
- South America (0.04)
- Oceania (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (7 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- Government (0.46)
- Health & Medicine (0.46)
Should AI have a role in assessing research quality?
CERN, Europe's particle-physics laboratory, produces vast amounts of data, which are stored at its computer centre (pictured) and analysed with the help of artifical intelligence (AI). UK funders want to know whether AI could also assist in peer reviewing thousands of research outputs for nationwide quality audits.Credit: Dean Mouhtaropoulos/Getty Efforts to ease the workloads of peer reviewers by using artificial intelligence (AI) are gathering pace -- with one country's main research-evaluation exercise actively looking into ways of harnessing the technology. A study commissioned by the United Kingdom's main public research-funding bodies is examining how algorithms can assist in conducting peer review on journal articles submitted to the UK's Research Excellence Framework (REF). The REF, a national quality audit that measures the impact of research carried out at UK higher-education institutions, is a huge undertaking. In the latest iteration, the results of which were published in May 2022, more than 185,000 research outputs were evaluated from more than 76,000 academics based at 157 UK institutions.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Europe > United Kingdom > Wales (0.05)
- Europe > United Kingdom > Northern Ireland (0.05)
- (3 more...)
Challenges, Experiments, and Computational Solutions in Peer Review
While researchers are trained to do research, there is little training for peer review. Several initiatives and experiments have looked to address this challenge. Recently, the ICML 2020 conference adopted a method to select and then mentor junior reviewers, who would not have been asked to review otherwise, with a motivation of expanding the reviewer pool to address the large volume of submissions.43 An analysis of their reviews revealed that the junior reviewers were more engaged through various stages of the process as compared to conventional reviewers. Moreover, the conference asked meta reviewers to rate all reviews, and 30% of reviews written by junior reviewers received the highest rating by meta reviewers, in contrast to 14% for the main pool. Training reviewers at the beginning of their careers is a good start but may not be enough. There is some evidence8 that quality of an individual's review falls over time, at a slow but steady rate, possibly because of increasing time constraints or in reaction to poor-quality reviews they themselves receive. While researchers are trained to do research, there is little training for peer review … Training reviewers at the beginning of their careers is a good start but may not be enough.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
Institutionalising Ethics in AI through Broader Impact Requirements
Prunkl, Carina, Ashurst, Carolyn, Anderljung, Markus, Webb, Helena, Leike, Jan, Dafoe, Allan
Turning principles into practice is one of the most pressing challenges of artificial intelligence (AI) governance. In this article, we reflect on a novel governance initiative by one of the world's largest AI conferences. In 2020, the Conference on Neural Information Processing Systems (NeurIPS) introduced a requirement for submitting authors to include a statement on the broader societal impacts of their research. Drawing insights from similar governance initiatives, including institutional review boards (IRBs) and impact requirements for funding applications, we investigate the risks, challenges and potential benefits of such an initiative. Among the challenges, we list a lack of recognised best practice and procedural transparency, researcher opportunity costs, institutional and social pressures, cognitive biases, and the inherently difficult nature of the task. The potential benefits, on the other hand, include improved anticipation and identification of impacts, better communication with policy and governance experts, and a general strengthening of the norms around responsible research. To maximise the chance of success, we recommend measures to increase transparency, improve guidance, create incentives to engage earnestly with the process, and facilitate public deliberation on the requirement's merits and future. Perhaps the most important contribution from this analysis are the insights we can gain regarding effective community-based governance and the role and responsibility of the AI research community more broadly.
- Law (1.00)
- Health & Medicine (1.00)
- Government (1.00)
- (2 more...)