AITopics | Bali

Collaborating Authors

Bali

From Ground Truth to Measurement: A Statistical Framework for Human Labeling

Chew, Robert, Eckman, Stephanie, Kern, Christoph, Kreuter, Frauke

arXiv.org Machine LearningApr-10-2026

Supervised machine learning assumes that labeled data provide accurate measurements of the concepts models are meant to learn. Yet in practice, human labeling introduces systematic variation arising from ambiguous items, divergent interpretations, and simple mistakes. Machine learning research commonly treats all disagreement as noise, which obscures these distinctions and limits our understanding of what models actually learn. This paper reframes annotation as a measurement process and introduces a statistical framework for decomposing labeling outcomes into interpretable sources of variation: instance difficulty, annotator bias, situational noise, and relational alignment. The framework extends classical measurement-error models to accommodate both shared and individualized notions of truth, reflecting traditional and human label variation interpretations of error, and provides a diagnostic for assessing which regime better characterizes a given task. Applying the proposed model to a multi-annotator natural language inference dataset, we find empirical evidence for all four theorized components and demonstrate the effectiveness of our approach. We conclude with implications for data-centric machine learning and outline how this approach can guide the development of a more systematic science of labeling.

annotator, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2604.07591

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.68)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Reinforcement Learning from Human Feedback: A Statistical Perspective

Liu, Pangpang, Shi, Chengchun, Sun, Will Wei

arXiv.org Machine LearningApr-6-2026

Reinforcement learning from human feedback (RLHF) has emerged as a central framework for aligning large language models (LLMs) with human preferences. Despite its practical success, RLHF raises fundamental statistical questions because it relies on noisy, subjective, and often heterogeneous feedback to learn reward models and optimize policies. This survey provides a statistical perspective on RLHF, focusing primarily on the LLM alignment setting. We introduce the main components of RLHF, including supervised fine-tuning, reward modeling, and policy optimization, and relate them to familiar statistical ideas such as Bradley-Terry-Luce (BTL) model, latent utility estimation, active learning, experimental design, and uncertainty quantification. We review methods for learning reward functions from pairwise preference data and for optimizing policies through both two-stage RLHF pipelines and emerging one-stage approaches such as direct preference optimization. We further discuss recent extensions including reinforcement learning from AI feedback, inference-time algorithms, and reinforcement learning from verifiable rewards, as well as benchmark datasets, evaluation protocols, and open-source frameworks that support RLHF research. We conclude by highlighting open challenges in RLHF. An accompanying GitHub demo https://github.com/Pangpang-Liu/RLHF_demo illustrates key components of the RLHF pipeline.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

arXiv.org Machine Learning

2604.02507

Country:

Europe > Austria > Vienna (0.14)
Africa > South Africa (0.14)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre:

Research Report (1.00)
Overview (0.86)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices

Arias, Esteban Garces, Sapargali, Nurzhan, Heumann, Christian, Aßenmacher, Matthias

arXiv.org Machine LearningMar-20-2026

Standard decoding strategies for text generation, including top-k, nucleus sampling, and contrastive search, select tokens based on likelihood, restricting selection to high-probability regions. Human language production operates differently: tokens are chosen for communicative appropriateness rather than statistical frequency. This mismatch creates a truncation blind spot: contextually appropriate but statistically rare tokens remain accessible to humans yet unreachable by likelihood-based decoding. We hypothesize this contributes to the detectability of machine-generated text. Analyzing over 1.8 million texts across eight language models, five decoding strategies, and 53 hyperparameter configurations, we find that 8-18% of human-selected tokens fall outside typical truncation boundaries. Simple classifiers trained on predictability and lexical diversity achieve remarkable detection rates. Crucially, neither model scale nor architecture correlates strongly with detectability; truncation parameters account for most variance. Configurations achieving low detectability often produce incoherent text, indicating that evading detection and producing natural text are distinct objectives. These findings suggest detectability is enhanced by likelihood-based token selection, not merely a matter of model capability.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2603.18482

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(6 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

20ffc2b42c7de4a1960cfdadf305bbe2-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-19-2026, 04:30:58 GMT

arxiv preprint arxiv, dataset, information, (13 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Beijing > Beijing (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (0.46)

Industry: Social Sector (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars Dongwei Pan

Neural Information Processing SystemsFeb-19-2026, 01:28:51 GMT

Synthesizing high-fidelity head avatars is a central problem for computer vision and graphics. While head avatar synthesis algorithms have advanced rapidly, the best ones still face great obstacles in real-world scenarios.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Industry:

Information Technology > Security & Privacy (0.46)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

fde7f40f8ced5735006810534dc66b33-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 19:52:05 GMT

arxiv preprint arxiv, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
(3 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

Y angsibo Huang 1 Noah A. Smith

Neural Information Processing SystemsFeb-18-2026, 19:09:03 GMT

When turned on, "GitHub Copilot checks code completion suggestions with their surrounding

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

South America > Peru (0.14)
North America > Belize (0.14)
North America > Mexico (0.14)
(9 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law > Intellectual Property & Technology Law (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (0.93)
Information Technology (0.93)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

f94cfd15db3f16ee7789b6b7e91ec476-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 18:50:41 GMT

information, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Saarland (0.04)
Asia > Philippines (0.04)
North America > Canada > Ontario > Toronto (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Leisure & Entertainment (0.45)
Information Technology (0.45)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

f606d45ae7b991988b6eea2af38b7057-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 17:36:30 GMT

kg-fit, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > Illinois (0.04)
North America > United States > District of Columbia > Washington (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (0.93)
Workflow (0.68)
Overview (0.67)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Information Technology (0.93)
Leisure & Entertainment > Sports (0.92)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
(3 more...)

Add feedback

DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph

Neural Information Processing SystemsFeb-18-2026, 17:34:55 GMT

The current paradigm of evaluating Large Language Models (LLMs) through static benchmarks comes with significant limitations, such as vulnerability to data contamination and a lack of adaptability to the evolving capabilities of LLMs.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

Neural Information Processing Systems

Country: