Goto

Collaborating Authors

 rank aggregation



FairRankAggregation

Neural Information Processing Systems

Ranking algorithms find extensive usage in diverse areas such as web search, employment,collegeadmission,voting,etc.



A theoretical guarantee for SyncRank

Rao, Yang

arXiv.org Machine Learning

The statistical ranking problem--inferring a global ordering of items from incomplete and noisy pairwise comparisons--emerges naturally in competitive sports analysis, preference aggregation, and economic exchange systems. Traditional approaches rooted in social choice theory often falter when confronted with modern datasets characterized by two pervasive challenges: (1) the comparisons are sparsely observed, with measurement graphs far from complete; (2) the noise exhibits strong heterogeneity, where certain subsets of comparisons may be significantly more reliable than others. These limitations are particularly evident in applications like soccer league standings analysis, where the outcome matrix contains both structured noise (e.g., home advantage biases) and random outliers. Recent advances in group synchronization theory provide a novel geometric perspective for this classical problem. By mapping player ranks to phases on the unit circle and rank differences to angular offsets, the ranking task becomes equivalent to solving an instance of the angular synchronization problem over SO(2). This reformulation inherits key theoretical guarantees from the synchronization literature: spectral and semidefinite programming (SDP) relaxations can provably recover the underlying ranking under quantifiable noise thresholds. Crucially, the circular representation inherently handles cyclic inconsistencies through phase wrapping, circumventing the need for explicit outlier removal mechanisms required by linear embedding approaches.




Rank Aggregation in Crowdsourcing for Listwise Annotations

Luo, Wenshui, Liu, Haoyu, Ding, Yongliang, Zhou, Tao, wan, Sheng, Wu, Runze, Lin, Minmin, Zhang, Cong, Fan, Changjie, Gong, Chen

arXiv.org Artificial Intelligence

Rank aggregation through crowdsourcing has recently gained significant attention, particularly in the context of listwise ranking annotations. However, existing methods primarily focus on a single problem and partial ranks, while the aggregation of listwise full ranks across numerous problems remains largely unexplored. This scenario finds relevance in various applications, such as model quality assessment and reinforcement learning with human feedback. In light of practical needs, we propose LAC, a Listwise rank Aggregation method in Crowdsourcing, where the global position information is carefully measured and included. In our design, an especially proposed annotation quality indicator is employed to measure the discrepancy between the annotated rank and the true rank. We also take the difficulty of the ranking problem itself into consideration, as it directly impacts the performance of annotators and consequently influences the final results. To our knowledge, LAC is the first work to directly deal with the full rank aggregation problem in listwise crowdsourcing, and simultaneously infer the difficulty of problems, the ability of annotators, and the ground-truth ranks in an unsupervised way. To evaluate our method, we collect a real-world business-oriented dataset for paragraph ranking. Experimental results on both synthetic and real-world benchmark datasets demonstrate the effectiveness of our proposed LAC method.


Federated Aggregation of Mallows Rankings: A Comparative Analysis of Borda and Lehmer Coding

Sima, Jin, Rana, Vishal, Milenkovic, Olgica

arXiv.org Artificial Intelligence

Rank aggregation combines multiple ranked lists into a consensus ranking. In fields like biomedical data sharing, rankings may be distributed and require privacy. This motivates the need for federated rank aggregation protocols, which support distributed, private, and communication-efficient learning across multiple clients with local data. We present the first known federated rank aggregation methods using Borda scoring and Lehmer codes, focusing on the sample complexity for federated algorithms on Mallows distributions with a known scaling factor $\phi$ and an unknown centroid permutation $\sigma_0$. Federated Borda approach involves local client scoring, nontrivial quantization, and privacy-preserving protocols. We show that for $\phi \in [0,1)$, and arbitrary $\sigma_0$ of length $N$, it suffices for each of the $L$ clients to locally aggregate $\max\{C_1(\phi), C_2(\phi)\frac{1}{L}\log \frac{N}{\delta}\}$ rankings, where $C_1(\phi)$ and $C_2(\phi)$ are constants, quantize the result, and send it to the server who can then recover $\sigma_0$ with probability $\geq 1-\delta$. Communication complexity scales as $NL \log N$. Our results represent the first rigorous analysis of Borda's method in centralized and distributed settings under the Mallows model. Federated Lehmer coding approach creates a local Lehmer code for each client, using a coordinate-majority aggregation approach with specialized quantization methods for efficiency and privacy. We show that for $\phi+\phi^2<1+\phi^N$, and arbitrary $\sigma_0$ of length $N$, it suffices for each of the $L$ clients to locally aggregate $\max\{C_3(\phi), C_4(\phi)\frac{1}{L}\log \frac{N}{\delta}\}$ rankings, where $C_3(\phi)$ and $C_4(\phi)$ are constants. Clients send truncated Lehmer coordinate histograms to the server, which can recover $\sigma_0$ with probability $\geq 1-\delta$. Communication complexity is $\sim O(N\log NL\log L)$.


Sequential Manipulation Against Rank Aggregation: Theory and Algorithm

Ma, Ke, Xu, Qianqian, Zeng, Jinshan, Liu, Wei, Cao, Xiaochun, Sun, Yingfei, Huang, Qingming

arXiv.org Artificial Intelligence

Rank aggregation with pairwise comparisons is widely encountered in sociology, politics, economics, psychology, sports, etc . Given the enormous social impact and the consequent incentives, the potential adversary has a strong motivation to manipulate the ranking list. However, the ideal attack opportunity and the excessive adversarial capability cause the existing methods to be impractical. To fully explore the potential risks, we leverage an online attack on the vulnerable data collection process. Since it is independent of rank aggregation and lacks effective protection mechanisms, we disrupt the data collection process by fabricating pairwise comparisons without knowledge of the future data or the true distribution. From the game-theoretic perspective, the confrontation scenario between the online manipulator and the ranker who takes control of the original data source is formulated as a distributionally robust game that deals with the uncertainty of knowledge. Then we demonstrate that the equilibrium in the above game is potentially favorable to the adversary by analyzing the vulnerability of the sampling algorithms such as Bernoulli and reservoir methods. According to the above theoretical analysis, different sequential manipulation policies are proposed under a Bayesian decision framework and a large class of parametric pairwise comparison models. For attackers with complete knowledge, we establish the asymptotic optimality of the proposed policies. To increase the success rate of the sequential manipulation with incomplete knowledge, a distributionally robust estimator, which replaces the maximum likelihood estimation in a saddle point problem, provides a conservative data generation solution. Finally, the corroborating empirical evidence shows that the proposed method manipulates the results of rank aggregation methods in a sequential manner.


Rate-Optimal Rank Aggregation with Private Pairwise Rankings

Xu, Shirong, Sun, Will Wei, Cheng, Guang

arXiv.org Machine Learning

In various real-world scenarios like recommender systems and political surveys, pairwise rankings are commonly collected and utilized for rank aggregation to obtain an overall ranking of items. However, preference rankings can reveal individuals' personal preferences, underscoring the need to protect them before releasing for downstream analysis. In this paper, we address the challenge of preserving privacy while ensuring the utility of rank aggregation based on pairwise rankings generated from the Bradley-Terry-Luce (BTL) model. Using the randomized response mechanism to perturb raw pairwise rankings is a common privacy protection strategy used in practice, but a critical challenge arises because the privatized rankings no longer adhere to the BTL model, resulting in significant bias in downstream rank aggregation tasks. Motivated from this, we propose a debiased randomized response mechanism to protect the raw pairwise rankings, ensuring consistent estimation of true preferences and rankings in downstream rank aggregation. Theoretically, we offer insights into the relationship between overall privacy guarantees and estimation errors from private ranking data, and establish minimax rates for estimation errors. This enables the determination of optimal privacy guarantees that balance consistency in rank aggregation with robust privacy protection. We also investigate convergence rates of expected ranking errors for partial and full ranking recovery, quantifying how privacy protection influences the specification of top-$K$ item sets and complete rankings. Our findings are validated through extensive simulations and a real application.