AITopics | review score

Collaborating Authors

review score

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

How to Find Fantastic AI Papers: Self-Rankings as a Powerful Predictor of Scientific Impact Beyond Peer Review

Su, Buxin, Collina, Natalie, Wen, Garrett, Li, Didong, Cho, Kyunghyun, Fan, Jianqing, Zhao, Bingxin, Su, Weijie

arXiv.org Artificial IntelligenceNov-26-2025

Peer review in academic research aims not only to ensure factual correctness but also to identify work of high scientific potential that can shape future research directions. This task is especially critical in fast-moving fields such as artificial intelligence (AI), yet it has become increasingly difficult given the rapid growth of submissions. In this paper, we investigate an underexplored measure for identifying high-impact research: authors' own rankings of their multiple submissions to the same AI conference. Grounded in game-theoretic reasoning, we hypothesize that self-rankings are informative because authors possess unique understanding of their work's conceptual depth and long-term promise. To test this hypothesis, we conducted a large-scale experiment at a leading AI conference, where 1,342 researchers self-ranked their 2,592 submissions by perceived quality. Tracking outcomes over more than a year, we found that papers ranked highest by their authors received twice as many citations as their lowest-ranked counterparts; self-rankings were especially effective at identifying highly cited papers (those with over 150 citations). Moreover, we showed that self-rankings outperformed peer review scores in predicting future citation counts. Our results remained robust after accounting for confounders such as preprint posting time and self-citations. Together, these findings demonstrate that authors' self-rankings provide a reliable and valuable complement to peer review for identifying and elevating high-impact research in AI.

artificial intelligence, citation count, submission, (18 more...)

arXiv.org Artificial Intelligence

2510.02143

Country: North America > United States (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

From Authors to Reviewers: Leveraging Rankings to Improve Peer Review

Wang, Weichen, Shi, Chengchun

arXiv.org Artificial IntelligenceOct-28-2025

This paper is a discussion of the 2025 JASA discussion paper by Su et al. (2025). We would like to congratulate the authors on conducting a comprehensive and insightful empirical investigation of the 2023 ICML ranking data. The review quality of machine learning (ML) conferences has become a big concern in recent years, due to the rapidly growing number of submitted manuscripts. In this discussion, we propose an approach alternative to Su et al. (2025) that leverages ranking information from reviewers rather than authors. We simulate review data that closely mimics the 2023 ICML conference submissions. Our results show that (i) incorporating ranking information from reviewers can significantly improve the evaluation of each paper's quality, often outperforming the use of ranking information from authors alone; and (ii) combining ranking information from both reviewers and authors yields the most accurate evaluation of submitted papers in most scenarios.

artificial intelligence, machine learning, reviewer, (15 more...)

arXiv.org Artificial Intelligence

2510.21726

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LLM-REVal: Can We Trust LLM Reviewers Yet?

Li, Rui, Gu, Jia-Chen, Kung, Po-Nien, Xia, Heming, liu, Junfeng, Kong, Xiangwen, Sui, Zhifang, Peng, Nanyun

arXiv.org Artificial IntelligenceOct-15-2025

The rapid advancement of large language models (LLMs) has inspired researchers to integrate them extensively into the academic workflow, potentially reshaping how research is practiced and reviewed. While previous studies highlight the potential of LLMs in supporting research and peer review, their dual roles in the academic workflow and the complex interplay between research and review bring new risks that remain largely underexplored. In this study, we focus on how the deep integration of LLMs into both peer-review and research processes may influence scholarly fairness, examining the potential risks of using LLMs as reviewers by simulation. This simulation incorporates a research agent, which generates papers and revises, alongside a review agent, which assesses the submissions. Based on the simulation results, we conduct human annotations and identify pronounced misalignment between LLM-based reviews and human judgments: (1) LLM reviewers systematically inflate scores for LLM-authored papers, assigning them markedly higher scores than human-authored ones; (2) LLM reviewers persistently underrate human-authored papers with critical statements (e.g., risk, fairness), even after multiple revisions. Our analysis reveals that these stem from two primary biases in LLM reviewers: a linguistic feature bias favoring LLM-generated writing styles, and an aversion toward critical statements. These results highlight the risks and equity concerns posed to human authors and academic research if LLMs are deployed in the peer review cycle without adequate caution. On the other hand, revisions guided by LLM reviews yield quality gains in both LLM-based and human evaluations, illustrating the potential of the LLMs-as-reviewers for early-stage researchers and enhancing low-quality papers.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.12367

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(5 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

NAIPv2: Debiased Pairwise Learning for Efficient Paper Quality Estimation

Zhao, Penghai, Tian, Jinyu, Xing, Qinghua, Zhang, Xin, Li, Zheng, Qian, Jianjun, Cheng, Ming-Ming, Li, Xiang

arXiv.org Artificial IntelligenceOct-2-2025

The ability to estimate the quality of scientific papers is central to how both humans and AI systems will advance scientific knowledge in the future. However, existing LLM-based estimation methods suffer from high inference cost, whereas the faster direct score regression approach is limited by scale inconsistencies. We present NAIPv2, a debiased and efficient framework for paper quality estimation. NAIPv2 employs pairwise learning within domain-year groups to reduce inconsistencies in reviewer ratings and introduces the Review Tendency Signal (RTS) as a probabilistic integration of reviewer scores and confidences. To support training and evaluation, we further construct NAIDv2, a large-scale dataset of 24,276 ICLR submissions enriched with metadata and detailed structured content. Trained on pairwise comparisons but enabling efficient pointwise prediction at deployment, NAIPv2 achieves state-of-the-art performance (78.2% AUC, 0.432 Spearman), while maintaining scalable, linear-time efficiency at inference. Notably, on unseen NeurIPS submissions, it further demonstrates strong generalization, with predicted scores increasing consistently across decision categories from Rejected to Oral. These findings establish NAIPv2 as a debiased and scalable framework for automated paper quality estimation, marking a step toward future scientific intelligence systems. Code and dataset are released at sway.cloud.microsoft/Pr42npP80MfPhvj8.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.25179

Country:

Asia > China (0.46)
North America > United States (0.28)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications

Keuper, Janis

arXiv.org Artificial IntelligenceSep-26-2025

The ongoing intense discussion on rising LLM usage in the scientificpeer-review process has recently been mingled by reports of authors using hi dden prompt injections to manipulate review scores. Since the existence of su ch "attacks" - although seen by some commentators as "self-defense" - would have a great impact on the further debate, this paper investigates the practicability and technical success of the described manipulations. Our systematic evaluation uses 1k reviews of 2024 ICLR papers generated by a wide range of LLMs shows two distinct results: I) very simple prompt injections are indeed highly effective, reaching up to 100% acceptance scores. II) LLM reviews are generally biased toward acceptance (>95% in many models). Both results have great impact on the ongoing discussionson LLM usage in peer-review.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2509.10248

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Supplementary Materials A Causal Concept Effects and Metrics for Explanation Methods

Neural Information Processing SystemsAug-15-2025, 18:11:15 GMT

Data do not materialize out of thin air. Rather, data are generated from real-world processes with complex causal structures we do not observe directly. G nor can we observe both interventions for the same subject. For example, in the context of CEBaB, we might ask 1. Each of the above questions requires the estimation of a different theoretical quantity.

causal concept effect, concept effect, icace 0, (13 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Industry: Consumer Products & Services > Restaurants (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Add feedback

Automatic Evaluation Metrics for Artificially Generated Scientific Research

Höpner, Niklas, Eshuijs, Leon, Alivanistos, Dimitrios, Zamprogno, Giacomo, Tiddi, Ilaria

arXiv.org Artificial IntelligenceFeb-14-2025

Foundation models are increasingly used in scientific research, but evaluating AI-generated scientific work remains challenging. While expert reviews are costly, large language models (LLMs) as proxy reviewers have proven to be unreliable. To address this, we investigate two automatic evaluation metrics, specifically citation count prediction and review score prediction. We parse all papers of OpenReview and augment each submission with its citation count, reference, and research hypothesis. Our findings reveal that citation count prediction is more viable than review score prediction, and predicting scores is more difficult purely from the research hypothesis than from the full paper. Furthermore, we show that a simple prediction model based solely on title and abstract outperforms LLM-based reviewers, though it still falls short of human-level consistency.

dataset, research hypothesis, submission, (13 more...)

arXiv.org Artificial Intelligence

2503.05712

Country:

Europe > Austria > Vienna (0.14)
Asia > Singapore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(17 more...)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reviews: Joint Optimization of Tree-based Index and Deep Model for Recommender Systems

Neural Information Processing SystemsJan-22-2025, 02:23:13 GMT

The review scores were somewhat borderline, but overall slightly above the acceptance threshold. There was some disagreement among the reviewers, following which a discussion was initiated. The rebuttal largely addresses the concerns of R1 (the most negative review), and in the metareviewer's opinion does a reasonable job of addressing these concerns, which are mostly clarifications regarding the performance of the algorithm. Positively, the reviewers mostly concur that the method, while fairly straightforward, offers significant improvements over existing techniques. After discussion there was some positive movement in review scores resulting in a positive consensus among reviewers.

joint optimization, recommender system, tree-based index and deep model, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.40)

Add feedback

The 12 best gadgets we reviewed this year

EngadgetDec-27-2024, 17:30:24 GMT

I've lost count of the number of things we reviewed this year at Engadget. In 2024, the types of products we tested ranged from the typical phones, laptops and headphones to AI wearables, robotic lawnmowers and handheld gaming consoles, alongside games and shows. It can feel hard to keep track of it all, but thankfully, our scoring system helps us highlight the best (and the worst) devices each year. Our team of reviewers and editors evaluate products based on their performance, value and how they hold up against the competition, and at least two people weigh in on every score before it's published. If something gets a result of 80 and up, it's considered a "Recommended" product, while those scoring 90 and more are awarded "Editors' Choice." The latter means they're the best in their class, beating out most of the competition.

battery life, gadget, iphone 16, (16 more...)

Engadget

Country: North America > United States (0.14)

Industry: