AITopics | pairwise comparison

Collaborating Authors

pairwise comparison

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sample Complexity Bounds for Active Ranking from Multi-wise Comparisons

Neural Information Processing SystemsApr-25-2026, 02:43:16 GMT

We study the sample complexity (i.e., the number of comparisons needed) bounds for actively ranking a set of n items from multi-wise comparisons. Here, a multiwise comparison takes m items as input and returns a (noisy) result about the best item (the winner feedback) or the order of these items (the full-ranking feedback). We consider two basic ranking problems: top-k items selection and full ranking. Unlike previous works that study ranking from multi-wise comparisons, in this paper, we do not require any parametric model or assumption and work on the fundamental setting where each comparison returns the correct result with probability 1or a certain probability larger than 12. This paper helps understand whether and to what degree utilizing multi-wise comparisons can reduce the sample complexity for the ranking problems compared to ranking from pairwise comparisons. Specifically, under the winner feedback setting, one can reduce the sample complexity for top-k selection up to an m factor and that for full ranking up to a logm factor. Under the full-ranking feedback setting, one can reduce the sample complexity for top-k selection up to an m factor and that for full ranking up to an mlogm factor. We also conduct numerical simulations to confirm our theoretical results.

artificial intelligence, machine learning, mathematics of computing, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Mathematics of Computing (0.70)

Add feedback

22508552d3fc22f867e33e6c56b30b16-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 02:43:13 GMT

artificial intelligence, machine learning, sample complexity, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Noisy Nonreciprocal Pairwise Comparisons: Scale Variation, Noise Calibration, and Admissible Ranking Regions

Magnot, Jean-Pierre

arXiv.org Machine LearningApr-7-2026

Pairwise comparisons are widely used in decision analysis, preference modeling, and evaluation problems. In many practical situations, the observed comparison matrix is not reciprocal. This lack of reciprocity is often treated as a defect to be corrected immediately. In this article, we adopt a different point of view: part of the nonreciprocity may reflect a genuine variation in the evaluation scale, while another part is due to random perturbations. We introduce an additive model in which the unknown underlying comparison matrix is consistent but not necessarily reciprocal. The reciprocal component carries the global ranking information, whereas the symmetric component describes possible scale variation. Around this structured matrix, we add a random perturbation and show how to estimate the noise level, assess whether the scale variation remains moderate, and assign probabilities to admissible ranking regions in the sense of strict ranking by pairwise comparisons. We also compare this approach with the brutal projection onto reciprocal matrices, which suppresses all symmetric information at once. The Gaussian perturbation model is used here not because human decisions are exactly Gaussian, but because observed judgment errors often result from the accumulation of many small effects. In such a context, the central limit principle provides a natural heuristic justification for Gaussian noise. This makes it possible to derive explicit estimators and probability assessments while keeping the model interpretable for decision problems.

artificial intelligence, matrix, scale deformation, (14 more...)

arXiv.org Machine Learning

2604.04588

Country:

South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
North America > United States > New York (0.04)
Europe > Slovakia > Presov > Prešov (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Prediction-Powered Ranking of Large Language Models

Neural Information Processing SystemsMar-22-2026, 13:20:07 GMT

Large language models are often ranked according to their level of alignment with human preferences---a model is better than other models if its outputs are more frequently preferred by humans. One of the popular ways to elicit human preferences utilizes pairwise comparisons between the outputs provided by different models to the same inputs. However, since gathering pairwise comparisons by humans is costly and time-consuming, it has become a common practice to gather pairwise comparisons by a strong large language model---a model strongly aligned with human preferences. Surprisingly, practitioners cannot currently measure the uncertainty that any mismatch between human and model preferences may introduce in the constructed rankings. In this work, we develop a statistical framework to bridge this gap. Given a (small) set of pairwise comparisons by humans and a large set of pairwise comparisons by a model, our framework provides a rank-set---a set of possible ranking positions---for each of the models under comparison. Moreover, it guarantees that, with a probability greater than or equal to a user-specified value, the rank-sets cover the true ranking consistent with the distribution of human pairwise preferences asymptotically. Using pairwise comparisons made by humans in the LMSYS Chatbot Arena platform and pairwise comparisons made by three strong large language models, we empirically demonstrate the effectivity of our framework and show that the rank-sets constructed using only pairwise comparisons by the strong large language models are often inconsistent with (the distribution of) human pairwise preferences.

artificial intelligence, large language model, natural language, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)

Add feedback

cd47cd67caa87f5b1944e00f6781598f-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 05:01:28 GMT

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Games > Chess (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Active preference learning for ordering items in-and out-of-sample Herman Bergström Chalmers University of Technology and University of Gothenburg hermanb@chalmers.se Emil Carlsson

Neural Information Processing SystemsFeb-16-2026, 07:02:03 GMT

Learning an ordering of items based on pairwise comparisons is useful when items are difficult to rate consistently on an absolute scale, for example, when annotators have to make subjective assessments. When exhaustive comparison is infeasible, actively sampling item pairs can reduce the number of annotations necessary for learning an accurate ordering. However, many algorithms ignore shared structure between items, limiting their sample efficiency and precluding generalization to new items. It is also common to disregard how noise in comparisons varies between item pairs, despite it being informative of item similarity. In this work, we study active preference learning for ordering items with contextual attributes, both in-and out-of-sample. We give an upper bound on the expected ordering error of a logistic preference model as a function of which items have been compared. Next, we propose an active learning strategy that samples items to minimize this bound by accounting for aleatoric and epistemic uncertainty in comparisons. We evaluate the resulting algorithm, and a variant aimed at reducing model misspecification, in multiple realistic ordering tasks with comparisons made by human annotators. Our results demonstrate superior sample efficiency and generalization compared to non-contextual ranking approaches and active preference learning baselines.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: