AITopics | human reviewer

Collaborating Authors

human reviewer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring the use of AI authors and reviewers at Agents4Science

Bianchi, Federico, Queen, Owen, Thakkar, Nitya, Sun, Eric, Zou, James

arXiv.org Artificial IntelligenceNov-20-2025

There is growing interest in using AI agents for scientific research, yet fundamental questions remain about their capabilities as scientists and reviewers. To explore these questions, we organized Agents4Science, the first conference in which AI agents serve as both primary authors and reviewers, with humans as co-authors and co-reviewers. Here, we discuss the key learnings from the conference and their implications for human-AI collaboration in science.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.15534

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
Asia > Japan (0.04)
Asia > China (0.04)

Genre: Research Report (0.83)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Fine-Tuning Multilingual Language Models for Code Review: An Empirical Study on Industrial C# Projects

Begolli, Igli, Aksoy, Meltem, Neider, Daniel

arXiv.org Artificial IntelligenceOct-24-2025

Code review is essential for maintaining software quality but often time-consuming and cognitively demanding, especially in industrial environments. Recent advancements in language models (LMs) have opened new avenues for automating core review tasks. This study presents the empirical evaluation of monolingual fine-tuning on the performance of open-source LMs across three key automated code review tasks: Code Change Quality Estimation, Review Comment Generation, and Code Refinement. We fine-tuned three distinct models, CodeReviewer, CodeLlama-7B, and DeepSeek-R1-Distill, on a C\# specific dataset combining public benchmarks with industrial repositories. Our study investigates how different configurations of programming languages and natural languages in the training data affect LM performance, particularly in comment generation. Additionally, we benchmark the fine-tuned models against an automated software analysis tool (ASAT) and human reviewers to evaluate their practical utility in real-world settings. Our results show that monolingual fine-tuning improves model accuracy and relevance compared to multilingual baselines. While LMs can effectively support code review workflows, especially for routine or repetitive tasks, human reviewers remain superior in handling semantically complex or context-sensitive changes. Our findings highlight the importance of language alignment and task-specific adaptation in optimizing LMs for automated code review.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2507.19271

Country:

Asia > Macao (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.06)
(21 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)

Add feedback

Large Language Models for Full-Text Methods Assessment: A Case Study on Mediation Analysis

Zhang, Wenqing, Nguyen, Trang, Stuart, Elizabeth A., Chen, Yiqun T.

arXiv.org Artificial IntelligenceOct-14-2025

Systematic reviews are crucial for synthesizing scientific evidence but remain labor-intensive, especially when extracting detailed methodological information. Large language models (LLMs) offer potential for automating methodological assessments, promising to transform evidence synthesis. Here, using causal mediation analysis as a representative methodological domain, we benchmarked state-of-the-art LLMs against expert human reviewers across 180 full-text scientific articles. Model performance closely correlated with human judgments (accuracy correlation 0.71; F1 correlation 0.97), achieving near-human accuracy on straightforward, explicitly stated methodological criteria. However, accuracy sharply declined on complex, inference-intensive assessments, lagging expert reviewers by up to 15%. Errors commonly resulted from superficial linguistic cues -- for instance, models frequently misinterpreted keywords like "longitudinal" or "sensitivity" as automatic evidence of rigorous methodological approache, leading to systematic misclassifications. Longer documents yielded lower model accuracy, whereas publication year showed no significant effect. Our findings highlight an important pattern for practitioners using LLMs for methods review and synthesis from full texts: current LLMs excel at identifying explicit methodological features but require human oversight for nuanced interpretations. Integrating automated information extraction with targeted expert review thus provides a promising approach to enhance efficiency and methodological rigor in evidence synthesis across diverse scientific fields.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.10762

Country:

North America > United States > Maryland > Baltimore (0.05)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.68)
Law > Alternative Dispute Resolution (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.77)

Add feedback

Supplementary Material for: T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models

Neural Information Processing SystemsOct-10-2025, 06:22:16 GMT

Warning: This paper contains data and model outputs which are offensive in nature. Institutional Review Board (IRB) and obtained an exempt decision. Additionally, potential bias may arise due to the high cultural specificity of human reviewers. For instance, "explicit sexual content" is defined as "including Each video was evaluated by at least three volunteers. Following the initial assessment, we conduct a secondary cross-validation.

dataset, please describe, please provide, (13 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.35)

Industry: Law > Intellectual Property & Technology Law (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)

Add feedback

Reading the post-riot posts: how we traced far-right radicalisation across 51,000 Facebook messages

The GuardianSep-28-2025, 07:00:04 GMT

Jail sentences for those who made posts about the UK riots in summer 2024 have become a flashpont for online criticism. Jail sentences for those who made posts about the UK riots in summer 2024 have become a flashpont for online criticism. More than 1,100 people have been charged in connection to the summer 2024 riots. A small number of them were charged for offences related to their online activity. Their jail sentences - which ranged from 12 weeks to seven years - became a flashpoint for online criticism.

far-right radicalisation, post-riot post, radicalisation, (14 more...)

The Guardian

Country:

Europe > United Kingdom (0.29)
North America > United States (0.15)
Oceania > Australia (0.06)
(3 more...)

Industry:

Government > Regional Government (1.00)
Law Enforcement & Public Safety > Corrections (0.76)
Government > Immigration & Customs (0.72)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

From Replication to Redesign: Exploring Pairwise Comparisons for LLM-Based Peer Review

Zhang, Yaohui, Zhang, Haijing, Ji, Wenlong, Hua, Tianyu, Haber, Nick, Cao, Hancheng, Liang, Weixin

arXiv.org Artificial IntelligenceSep-26-2025

The advent of large language models (LLMs) offers unprecedented opportunities to reimagine peer review beyond the constraints of traditional workflows. Despite these opportunities, prior efforts have largely focused on replicating traditional review workflows with LLMs serving as direct substitutes for human reviewers, while limited attention has been given to exploring new paradigms that fundamentally rethink how LLMs can participate in the academic review process. In this paper, we introduce and explore a novel mechanism that employs LLM agents to perform pairwise comparisons among manuscripts instead of individual scoring. By aggregating outcomes from substantial pairwise evaluations, this approach enables a more accurate and robust measure of relative manuscript quality. Our experiments demonstrate that this comparative approach significantly outperforms traditional rating-based methods in identifying high-impact papers. However, our analysis also reveals emergent biases in the selection process, notably a reduced novelty in research topics and an increased institutional imbalance. These findings highlight both the transformative potential of rethinking peer review with LLMs and critical challenges that future systems must address to ensure equity and diversity.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.11343

Country: North America > United States (0.04)

Genre:

Research Report > New Finding (0.69)
Research Report > Experimental Study (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

ReviewRL: Towards Automated Scientific Review with RL

Zeng, Sihang, Tian, Kai, Zhang, Kaiyan, wang, Yuru, Gao, Junqi, Liu, Runze, Yang, Sa, Li, Jingxuan, Long, Xinwei, Ma, Jiaheng, Qi, Biqing, Zhou, Bowen

arXiv.org Artificial IntelligenceAug-15-2025

Peer review is essential for scientific progress but faces growing challenges due to increasing submission volumes and reviewer fatigue. Existing automated review approaches struggle with factual accuracy, rating consistency, and analytical depth, often generating superficial or generic feedback lacking the insights characteristic of high-quality human reviews. We introduce ReviewRL, a reinforcement learning framework for generating comprehensive and factually grounded scientific paper reviews. Our approach combines: (1) an ArXiv-MCP retrieval-augmented context generation pipeline that incorporates relevant scientific literature, (2) supervised fine-tuning that establishes foundational reviewing capabilities, and (3) a reinforcement learning procedure with a composite reward function that jointly enhances review quality and rating accuracy. Experiments on ICLR 2025 papers demonstrate that ReviewRL significantly outperforms existing methods across both rule-based metrics and model-based quality assessments. ReviewRL establishes a foundational framework for RL-driven automatic critique generation in scientific discovery, demonstrating promising potential for future development in this domain. The implementation of ReviewRL will be released at GitHub.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.10308

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
(2 more...)

Add feedback

How to disable Gemini AI on Android and keep control of your apps

FOX NewsJul-14-2025, 19:00:35 GMT

Fox News host Greg Gutfeld and guests discuss the reportedly woke answers from Google's AI chatbot Gemini on'Gutfeld!' Google is making a push to ensure its AI, Gemini, is tightly integrated with Android systems by granting it access to core apps like WhatsApp, Messages, and Phone. The rollout of this change started on July 7, 2025, and it may override older privacy configurations unless you know how to disable Gemini on Android. Here's what you need to know. Sign up for my FREE CyberGuy Report Get my best tech tips, urgent security alerts, and exclusive deals delivered straight to your inbox. Plus, you'll get instant access to my Ultimate Scam Survival Guide - free when you join my CYBERGUY.COM/NEWSLETTER.

artificial intelligence, chatbot, natural language, (16 more...)

FOX News

Industry:

Information Technology (0.52)
Media > News (0.37)

Technology:

Information Technology > Communications > Mobile (0.95)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.56)

Add feedback

Moderating Harm: Benchmarking Large Language Models for Cyberbullying Detection in YouTube Comments

Muminovic, Amel

arXiv.org Artificial IntelligenceJun-3-2025

As online platforms grow, comment sections increasingly host harassment that undermines user experience and well-being. This study benchmarks three leading large language models, OpenAI GPT-4.1, Google Gemini 1.5 Pro, and Anthropic Claude 3 Opus, on a corpus of 5,080 YouTube comments sampled from high-abuse threads in gaming, lifestyle, food vlog, and music channels. The dataset comprises 1,334 harmful and 3,746 non-harmful messages in English, Arabic, and Indonesian, annotated independently by two reviewers with substantial agreement (Cohen's kappa = 0.83). Using a unified prompt and deterministic settings, GPT-4.1 achieved the best overall balance with an F1 score of 0.863, precision of 0.887, and recall of 0.841. Gemini flagged the highest share of harmful posts (recall = 0.875) but its precision fell to 0.767 due to frequent false positives. Claude delivered the highest precision at 0.920 and the lowest false-positive rate of 0.022, yet its recall dropped to 0.720. Qualitative analysis showed that all three models struggle with sarcasm, coded insults, and mixed-language slang. These results underscore the need for moderation pipelines that combine complementary models, incorporate conversational context, and fine-tune for under-represented languages and implicit abuse. A de-identified version of the dataset and full prompts is publicly released to promote reproducibility and further progress in automated content moderation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.18927

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(10 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (0.84)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.66)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback