AITopics | true positive rate

Preference Learning with Lie Detectors can Induce Honesty or Evasion

Neural Information Processing SystemsJun-22-2026, 10:41:17 GMT

As AI systems become more capable, deceptive behaviors can undermine evaluation and mislead users at deployment. Recent work has shown that lie detectors can accurately classify deceptive behavior, but they are not typically used in the training pipeline due to concerns around contamination and objective hacking. We examine these concerns by incorporating a lie detector into the labelling step of LLM post-training and evaluating whether the learned policy is genuinely more honest, or instead learns to fool the lie detector while remaining deceptive. Using DolusChat, a novel 65k-example dataset with paired truthful/deceptive responses, we identify three key factors that determine the honesty of learned policies: amount of exploration during preference learning, lie detector accuracy, and KL regularization strength. We find that preference learning with lie detectors and GRPO can lead to policies which evade lie detectors, with deception rates of over 85%. However, if the lie detector true positive rate (TPR) or KL regularization is sufficiently high, GRPO learns honest policies. In contrast, off-policy algorithms (DPO) consistently lead to deception rates under 25% for realistic TPRs. Our results illustrate a more complex picture than previously assumed: depending on the context, lie-detector-enhanced training can be a powerful tool for scalable oversight, or a counterproductive method encouraging undetectable misalignment.

detector, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Instructional Material (1.00)

Industry:

Transportation (1.00)
Telecommunications (1.00)
Law (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

mhealth_ood_neurips_2021.pdf

Neural Information Processing SystemsApr-24-2026, 21:49:17 GMT

artificial intelligence, machine learning, tnr, (17 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

bb7946e7d85c81a9e69fee1cea4a087c-Paper.pdf

Neural Information Processing SystemsFeb-13-2026, 20:03:53 GMT

candidate model, experiment, relmulti, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(3 more...)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

74fc5575632191d96881d8015f79dde3-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 20:29:27 GMT

algorithm, dag constraint, graph, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.53)

Add feedback

2d16ad1968844a4300e9a490588ff9f8-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 23:05:49 GMT

main protease, molecule, scaffold, (12 more...)

Neural Information Processing Systems

Country: North America > United States (0.47)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)

Add feedback

mhealth_ood_neurips_2021.pdf

Neural Information Processing SystemsFeb-7-2026, 16:14:53 GMT

interface, tnr, tpr95, (15 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Kernel Stein Tests for Multiple Model Comparison

Neural Information Processing SystemsDec-25-2025, 22:22:17 GMT

We address the problem of non-parametric multiple model comparison: given $l$ candidate models, decide whether each candidate is as good as the best one(s) or worse than it. We propose two statistical tests, each controlling a different notion of decision errors. The first test, building on the post selection inference framework, provably controls the number of best models that are wrongly declared worse (false positive rate). The second test is based on multiple correction, and controls the proportion of the models declared worse but are in fact as good as the best (false discovery rate). We prove that under appropriate conditions the first test can yield a higher true positive rate than the second. Experimental results on toy and real (CelebA, Chicago Crime data) problems show that the two tests have high true positive rates with well-controlled error rates. By contrast, the naive approach of choosing the model with the lowest score without correction leads to more false positives.

kernel stein test, multiple model comparison, name change, (3 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois > Cook County > Chicago (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Subspace Clustering via Tangent Cones

Amin Jalali, Rebecca Willett

Neural Information Processing SystemsNov-21-2025, 13:12:47 GMT

Given samples lying on any of a number of subspaces, subspace clustering is the task of grouping the samples based on the their corresponding subspaces. Many subspace clustering methods operate by assigning a measure of affinity to each pair of points and feeding these affinities into a graph clustering algorithm. This paper proposes a new paradigm for subspace clustering that computes affinities based on the corresponding conic geometry. The proposed conic subspace clustering (CSC) approach considers the convex hull of a collection of normalized data points and the corresponding tangent cones. The union of subspaces underlying the data imposes a strong association between the tangent cone at a sample x and the original subspace containing x . In addition to describing this novel geometric perspective, this paper provides a practical algorithm for subspace clustering that leverages this perspective, where a tangent cone membership test is used to estimate the affinities. This algorithm is accompanied with deterministic and stochastic guarantees on the properties of the learned affinity matrix, on the true and false positive rates and spread, which directly translate into the overall clustering accuracy.

affinity matrix, subspace, true positive rate, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Equality of Opportunity in Supervised Learning

Moritz Hardt, Eric Price, Eric Price, Nati Srebro

Neural Information Processing SystemsNov-21-2025, 08:18:56 GMT

We enourage readers to consult the more complete manuscript on the arXiv.

artificial intelligence, machine learning, predictor, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Industry:

Banking & Finance (0.94)
Law (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Recycling Privileged Learning and Distribution Matching for Fairness

Novi Quadrianto, Viktoriia Sharmanska

Neural Information Processing SystemsNov-21-2025, 05:53:09 GMT

Machine learning technologies have permeated everyday life and it is common nowadays that an automated system makes decisions for/about us, such as who is going to get bank credit.

constraint, evolutionary algorithm, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > East Sussex > Brighton (0.04)
(2 more...)

Industry: Law (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Filters

Collaborating Authors

true positive rate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Preference Learning with Lie Detectors can Induce Honesty or Evasion

mhealth_ood_neurips_2021.pdf

bb7946e7d85c81a9e69fee1cea4a087c-Paper.pdf

74fc5575632191d96881d8015f79dde3-Supplemental-Conference.pdf

2d16ad1968844a4300e9a490588ff9f8-Supplemental.pdf

mhealth_ood_neurips_2021.pdf

Kernel Stein Tests for Multiple Model Comparison

Subspace Clustering via Tangent Cones

Equality of Opportunity in Supervised Learning

Recycling Privileged Learning and Distribution Matching for Fairness