AITopics | evaluator

Collaborating Authors

evaluator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets

Neural Information Processing SystemsMay-1-2026, 02:04:18 GMT

Language models can generate harmful and biased outputs and exhibit undesirable behavior according to a given cultural context. We propose a Process for Adapting Language Models to Society (PALMS) with ValuesTargeted Datasets, an iterative process to significantly change model behavior by crafting and fine-tuning on a dataset that reflects a predetermined set of target values. We evaluate our process using three metrics: quantitative metrics with human evaluations that score output adherence to a target value, toxicity scoring on outputs; and qualitative metrics analyzing the most common word associated with a given social category. Through each iteration, we add additional training dataset examples based on observed shortcomings from evaluations. PALMS performs significantly better on all metrics compared to baseline and control models for a broad range of GPT-3 language model sizes without compromising capability integrity. We find that the effectiveness of PALMS increases with model size. We show that significantly adjusting language model behavior is feasible with a small, hand-curated dataset.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report (0.71)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Health & Medicine > Public Health (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

f323d594aa5d2c68154433a131c07959-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-30-2026, 07:10:50 GMT

gpt-3, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.67)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

General Machine Learning: Theory for Learning Under Variable Regimes

Osmani, Aomar

arXiv.org Machine LearningMar-25-2026

We study learning under regime variation, where the learner, its memory state, and the evaluative conditions may evolve over time. This paper is a foundational and structural contribution: its goal is to define the core learning-theoretic objects required for such settings and to establish their first theorem-supporting consequences. The paper develops a regime-varying framework centered on admissible transport, protected-core preservation, and evaluator-aware learning evolution. It records the immediate closure consequences of admissibility, develops a structural obstruction argument for faithful fixed-ontology reduction in genuinely multi-regime settings, and introduces a protected-stability template together with explicit numerical and symbolic witnesses on controlled subclasses, including convex and deductive settings. It also establishes theorem-layer results on evaluator factorization, morphisms, composition, and partial kernel-level alignment across semantically commensurable layers. A worked two-regime example makes the admissibility certificate, protected evaluative core, and regime-variation cost explicit on a controlled subclass. The symbolic component is deliberately restricted in scope: the paper establishes a first kernel-level compatibility result together with a controlled monotonic deductive witness. The manuscript should therefore be read as introducing a structured learning-theoretic framework for regime-varying learning together with its first theorem-supporting layer, not as a complete quantitative theory of all learning systems.

aomar osmani gml, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

2603.2322

Country:

Europe > France > Normandy > Seine-Maritime > Rouen (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.40)

Industry: Education (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

An Auditable AI Agent Loop for Empirical Economics: A Case Study in Forecast Combination

Shin, Minchul

arXiv.org Machine LearningMar-23-2026

AI coding agents make empirical specification search fast and cheap, but they also widen hidden researcher degrees of freedom. Building on an open-source agent-loop architecture, this paper adapts that framework to an empirical economics workflow and adds a post-search holdout evaluation. In a forecast-combination illustration, multiple independent agent runs outperform standard benchmarks in the original rolling evaluation, but not all continue to do so on a post-search holdout. Logged search and holdout evaluation together make adaptive specification search more transparent and help distinguish robust improvements from sample-specific discoveries.

agent, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

2603.17381

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Economy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses

Neural Information Processing SystemsMar-21-2026, 00:59:22 GMT

The rapid progress in Large Language Models (LLMs) poses potential risks such as generating unethical content. Assessing the values embedded in LLMs' generated responses can help expose their misalignment, but this relies on reference-free value evaluators, e.g.

artificial intelligence, large language model, natural language, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

Neural Information Processing SystemsMar-20-2026, 19:43:22 GMT

Current AI alignment methodologies rely on human-provided demonstrations or judgments, and the learned capabilities of AI systems would be upper-bounded by human capabilities as a result. This raises a challenging research question: How can we keep improving the systems when their capabilities have surpassed the levels of humans?

artificial intelligence, easy-to-hard generalization, proceedings, (10 more...)

Neural Information Processing Systems

Genre: Research Report (0.61)

Technology: Information Technology > Artificial Intelligence (0.99)

Add feedback

Safety through feedback in Constrained RL

Neural Information Processing SystemsFeb-18-2026, 19:30:34 GMT

This feedback can be system generated or elicited from a human observing the training process. Previous approaches have not been able to scale to complex environments and are constrained to receiving feedback at the state level which can be expensive to collect. To this end, we introduce an approach that scales to more complex domains and extends beyond state-level feedback, thus, reducing the burden on the evaluator.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

e2e06adf560b0706d3b1ddfca9f29756-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 11:05:58 GMT

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

e2e06adf560b0706d3b1ddfca9f29756-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 11:05:54 GMT

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning

Neural Information Processing SystemsFeb-18-2026, 11:05:02 GMT

As we conclude this extensive data collection effort, we release the entire preference dataset to the research community, fostering further advancements in AI humor generation and evaluation.

caption, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.98)
Information Technology > Communications > Social Media > Crowdsourcing (0.82)

Add feedback