AITopics | judgement

Appendix 598 A License and Intended Use

Neural Information Processing SystemsFeb-11-2026, 23:36:25 GMT

More specifically, you will be given the following: 1. An image context: This will describe the contents of an image with sufficient detail to address the instruction.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > New Hampshire (0.05)
North America > United States > Virginia (0.04)
North America > United States > Massachusetts (0.04)

Industry: Leisure & Entertainment (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.31)

Add feedback

Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection

Piot, Paloma, Otero, David, Martín-Rodilla, Patricia, Parapar, Javier

arXiv.org Artificial IntelligenceDec-11-2025

Hate speech spreads widely online, harming individuals and communities, making automatic detection essential for large-scale moderation, yet detecting it remains difficult. Part of the challenge lies in subjectivity: what one person flags as hate speech, another may see as benign. Traditional annotation agreement metrics, such as Cohen's $κ$, oversimplify this disagreement, treating it as an error rather than meaningful diversity. Meanwhile, Large Language Models (LLMs) promise scalable annotation, but prior studies demonstrate that they cannot fully replace human judgement, especially in subjective tasks. In this work, we reexamine LLM reliability using a subjectivity-aware framework, cross-Rater Reliability (xRR), revealing that even under fairer lens, LLMs still diverge from humans. Yet this limitation opens an opportunity: we find that LLM-generated annotations can reliably reflect performance trends across classification models, correlating with human evaluations. We test this by examining whether LLM-generated annotations preserve the relative ordering of model performance derived from human evaluation (i.e. whether models ranked as more reliable by human annotators preserve the same order when evaluated with LLM-generated labels). Our results show that, although LLMs differ from humans at the instance level, they reproduce similar ranking and classification patterns, suggesting their potential as proxy evaluators. While not a substitute for human annotators, they might serve as a scalable proxy for evaluation in subjective NLP tasks.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.09662

Country:

Europe > Austria > Vienna (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(11 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Artificial Intelligence Applications in Horizon Scanning for Infectious Diseases

Miles, Ian, Wakimoto, Mayumi, Meira, Wagner Jr., Paula, Daniela, Ticiane, Daylene, Rosa, Bruno, Biddulph, Jane, Georgiou, Stelios, Ermida, Valdir

arXiv.org Artificial IntelligenceDec-5-2025

This review explores the integration of Artificial Intelligence into Horizon Scanning, focusing on identifying and responding to emerging threats and opportunities linked to Infectious Diseases. We examine how AI tools can enhance signal detection, data monitoring, scenario analysis, and decision support. We also address the risks associated with AI adoption and propose strategies for effective implementation and governance. The findings contribute to the growing body of Foresight literature by demonstrating the potential and limitations of AI in Public Health preparedness.

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2512.04287

Country:

Asia > Japan > Kyūshū & Okinawa > Kyūshū > Kumamoto Prefecture > Kumamoto (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
South America > Brazil > Minas Gerais (0.04)
(6 more...)

Genre:

Overview (1.00)
Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Public Health (1.00)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Applied AI (0.93)
(3 more...)

Add feedback

914b372356d58d9e9357b29332cb8fdc-Paper-Conference.pdf

Neural Information Processing SystemsNov-19-2025, 21:27:55 GMT

large language model, machine learning, puzzle, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection

Mahdavi, Sadegh, Kisacanin, Branislav, Toshniwal, Shubham, Du, Wei, Moshkov, Ivan, Armstrong, George, Liao, Renjie, Thrampoulidis, Christos, Gitman, Igor

arXiv.org Artificial IntelligenceNov-18-2025

Large language models have achieved remarkable success on final-answer mathematical problems, largely due to the ease of applying reinforcement learning with verifiable rewards. However, the reasoning underlying these solutions is often flawed. Advancing to rigorous proof-based mathematics requires reliable proof verification capabilities. We begin by analyzing multiple evaluation setups and show that focusing on a single benchmark can lead to brittle or misleading conclusions. To address this, we evaluate both proof-based and final-answer reasoning to obtain a more reliable measure of model performance. We then scale two major generative verification methods (GenSelect and LLM-as-a-Judge) to millions of tokens and identify their combination as the most effective framework for solution verification and selection. We further show that the choice of prompt for LLM-as-a-Judge significantly affects the model's performance, but reinforcement learning can reduce this sensitivity. However, despite improving proof-level metrics, reinforcement learning does not enhance final-answer precision, indicating that current models often reward stylistic or procedural correctness rather than mathematical validity. Our results establish practical guidelines for designing and evaluating scalable proof-verification and selection systems.

large language model, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2511.13027

Country:

Europe > Austria > Vienna (0.14)
North America > Canada > British Columbia (0.04)
Europe > Serbia (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

5503389dbe070cdae9b48086c4996a59-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsNov-16-2025, 04:32:35 GMT

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New Hampshire (0.04)
North America > United States > Virginia (0.04)
(2 more...)

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

We disagree on the judgement with our highest respect, due to the nontrivial technical differences and results

Neural Information Processing SystemsNov-13-2025, 12:21:50 GMT

We thank all reviewers for their helpful and constructive comments. We'll further improve in the final version. In particular, our contributions are: (1) We introduce generalization bounds of learning algorithms on various losses, i.e. Besides, it's nontrivial to analyze the relationship between HL and RL, especially for the second inequality We'll add the discussions in the final version. We'll make the comparison and statements Below, we discuss the pros and cons of each one in detail.

artificial intelligence, generalization, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

914b372356d58d9e9357b29332cb8fdc-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 09:38:56 GMT

large language model, machine learning, puzzle, (18 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.68)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

Appendix 598 A License and Intended Use

Neural Information Processing SystemsOct-8-2025, 17:34:33 GMT

More specifically, you will be given the following: 1. An image context: This will describe the contents of an image with sufficient detail to address the instruction.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: