AITopics | comparison oracle

ComPO: Preference Alignment via Comparison Oracles

Neural Information Processing SystemsJun-21-2026, 21:53:02 GMT

Direct alignment methods are increasingly used for aligning large language models (LLMs) with human preferences. However, these methods suffer from the issues of likelihood displacement, which can be driven by noisy preference pairs that induce similar likelihood for preferred and dispreferred responses. The contributions of this paper are two-fold. First, we propose a preference alignment method based on zeroth-order, comparison-based optimization via comparison oracles and provide convergence guarantees for its basic mechanism. Second, we improve our method using some heuristics and conduct the experiments to demonstrate the flexibility and compatibility of practical mechanisms in improving the performance of LLMs using noisy preference pairs. Evaluations are conducted across multiple base and instruction-tuned models (Mistral-7B, Llama-3-8B and Gemma-2-9B) with benchmarks (AlpacaEval 2, MT-Bench and Arena-Hard)1. Experimental results show the effectiveness of our method as an alternative to addressing the limitations of existing methods, not only likelihood displacement but verbosity. A highlight of our work is that we evidence the importance of designing specialized methods for preference pairs with distinct likelihood margin, which complements the recent findings in Razin et al. [73].

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

efb9629755e598c4f261c44aeb6fde5e-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 06:10:13 GMT

Add feedback

Noise-Tolerant Interactive Learning Using Pairwise Comparisons

Neural Information Processing SystemsMar-17-2026, 18:21:20 GMT

We study the problem of interactively learning a binary classifier using noisy labeling and pairwise comparison oracles, where the comparison oracle answers which one in the given two instances is more likely to be positive. Learning from such oracles has multiple applications where obtaining direct labels is harder but pairwise comparisons are easier, and the algorithm can leverage both types of oracles. In this paper, we attempt to characterize how the access to an easier comparison oracle helps in improving the label and total query complexity. We show that the comparison oracle reduces the learning problem to that of learning a threshold function. We then present an algorithm that interactively queries the label and comparison oracles and we characterize its query complexity under Tsybakov and adversarial noise conditions for the comparison and labeling oracles. Our lower bounds show that our label and total query complexity is almost optimal.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Industry: Education (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Reinforcement learning from Human Feedback (RLHF) learns from preference signals, while standard Reinforcement Learning (RL) directly learns from reward

Neural Information Processing SystemsFeb-17-2026, 21:20:54 GMT

The latter case can be further reduced to adversarial MDP when preferences only depend on the final state.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)
Europe > France (0.04)
Europe > Austria (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Noise-Tolerant Interactive Learning Using Pairwise Comparisons

Neural Information Processing SystemsNov-21-2025, 16:11:04 GMT

We study the problem of interactively learning a binary classifier using noisy labeling and pairwise comparison oracles, where the comparison oracle answers which one in the given two instances is more likely to be positive. Learning from such oracles has multiple applications where obtaining direct labels is harder but pairwise comparisons are easier, and the algorithm can leverage both types of oracles. In this paper, we attempt to characterize how the access to an easier comparison oracle helps in improving the label and total query complexity. We show that the comparison oracle reduces the learning problem to that of learning a threshold function. We then present an algorithm that interactively queries the label and comparison oracles and we characterize its query complexity under Tsybakov and adversarial noise conditions for the comparison and labeling oracles. Our lower bounds show that our label and total query complexity is almost optimal.

name change, noise-tolerant interactive learning, pairwise comparison, (5 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting > Online (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Noise-Tolerant Interactive Learning Using Pairwise Comparisons

Yichong Xu, Hongyang Zhang, Kyle Miller, Aarti Singh, Artur Dubrawski

Neural Information Processing SystemsNov-21-2025, 13:23:22 GMT

In this paper, we attempt to characterize how the access to an easier comparison oracle helps in improving the label and total query complexity.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry: Education > Educational Setting > Online (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ComPO: Preference Alignment via Comparison Oracles

Chen, Peter, Chen, Xi, Yin, Wotao, Lin, Tianyi

arXiv.org Artificial IntelligenceOct-28-2025

Direct alignment methods are increasingly used for aligning large language models (LLMs) with human preferences. However, these methods suffer from the issues of verbosity and likelihood displacement, which can be driven by the noisy preference pairs that induce similar likelihood for preferred and dispreferred responses. The contributions of this paper are two-fold. First, we propose a new preference alignment method based on zeroth-order, comparison-based optimization via comparison oracles and provide convergence guarantees for its basic scheme. Second, we improve our method using some heuristics and conduct the experiments to demonstrate the flexibility and compatibility of practical scheme in improving the performance of LLMs using noisy preference pairs. Evaluations are conducted across multiple base and instruction-tuned models (Mistral-7B, Llama-3-8B and Gemma-2-9B) with benchmarks (AlpacaEval 2, MT-Bench and Arena-Hard). Experimental results show the effectiveness of our method as an alternative to addressing the limitations of existing direct alignment methods. A highlight of our work is that we evidence the importance of designing specialized methods for preference pairs with distinct likelihood margin, which complements the recent findings in Razin et al (2025).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.05465

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cooperative Bargaining Games Without Utilities: Mediated Solutions from Direction Oracles

Gupta, Kushagra, Murthy, Surya, Karabag, Mustafa O., Topcu, Ufuk, Fridovich-Keil, David

arXiv.org Artificial IntelligenceOct-20-2025

Cooperative bargaining games are widely used to model resource allocation and conflict resolution. Traditional solutions assume the mediator can access agents utility function values and gradients. However, there is an increasing number of settings, such as human AI interactions, where utility values may be inaccessible or incomparable due to unknown, nonaffine transformations. To model such settings, we consider that the mediator has access only to agents most preferred directions, i.e., normalized utility gradients in the decision space. To this end, we propose a cooperative bargaining algorithm where a mediator has access to only the direction oracle of each agent. We prove that unlike popular approaches such as the Nash and Kalai Smorodinsky bargaining solutions, our approach is invariant to monotonic nonaffine transformations, and that under strong convexity and smoothness assumptions, this approach enjoys global asymptotic convergence to Pareto stationary solutions. Moreover, we show that the bargaining solutions found by our algorithm also satisfy the axioms of symmetry and (under slightly stronger conditions) independence of irrelevant alternatives, which are popular in the literature. Finally, we conduct experiments in two domains, multi agent formation assignment and mediated stock portfolio allocation, which validate these theoretic results. All code for our experiments can be found at https://github.com/suryakmurthy/dibs_bargaining.

agent, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.14817

Country: North America > United States > Texas (0.14)

Genre: Research Report (1.00)

Industry:

Government > Military (0.46)
Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Noise-Tolerant Interactive Learning Using Pairwise Comparisons

Yichong Xu, Hongyang Zhang, Kyle Miller, Aarti Singh, Artur Dubrawski

Neural Information Processing SystemsOct-4-2024, 08:47:08 GMT

We study the problem of interactively learning a binary classifier using noisy labeling and pairwise comparison oracles, where the comparison oracle answers which one in the given two instances is more likely to be positive. Learning from such oracles has multiple applications where obtaining direct labels is harder but pairwise comparisons are easier, and the algorithm can leverage both types of oracles. In this paper, we attempt to characterize how the access to an easier comparison oracle helps in improving the label and total query complexity. We show that the comparison oracle reduces the learning problem to that of learning a threshold function. We then present an algorithm that interactively queries the label and comparison oracles and we characterize its query complexity under Tsybakov and adversarial noise conditions for the comparison and labeling oracles. Our lower bounds show that our label and total query complexity is almost optimal.

algorithm, complexity, oracle, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry: Education > Educational Setting > Online (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Randomized Algorithms for Comparison-based Search

Neural Information Processing SystemsMar-15-2024, 13:07:56 GMT

This paper addresses the problem of finding the nearest neighbor (or one of the R -nearest neighbors) of a query object q in a database of n objects, when we can only use a comparison oracle. The comparison oracle, given two reference objects and a query object, returns the reference object most similar to the query object. The main problem we study is how to search the database for the nearest neighbor (NN) of a query, while minimizing the questions. The difficulty of this problem depends on properties of the underlying database. We show the importance of a characterization: \emph{combinatorial disorder} D which defines approximate triangle inequalities on ranks.

comparison-based search, log 2, randomized algorithm, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.91)

Add feedback

Filters

Collaborating Authors

comparison oracle

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

ComPO: Preference Alignment via Comparison Oracles

efb9629755e598c4f261c44aeb6fde5e-Paper-Conference.pdf

Noise-Tolerant Interactive Learning Using Pairwise Comparisons

Reinforcement learning from Human Feedback (RLHF) learns from preference signals, while standard Reinforcement Learning (RL) directly learns from reward

Noise-Tolerant Interactive Learning Using Pairwise Comparisons

Noise-Tolerant Interactive Learning Using Pairwise Comparisons

ComPO: Preference Alignment via Comparison Oracles

Cooperative Bargaining Games Without Utilities: Mediated Solutions from Direction Oracles

Noise-Tolerant Interactive Learning Using Pairwise Comparisons

Randomized Algorithms for Comparison-based Search