ranking position
Breaking the Top-$K$ Barrier: Advancing Top-$K$ Ranking Metrics Optimization in Recommender Systems
Yang, Weiqin, Chen, Jiawei, Zhang, Shengjia, Wu, Peng, Sun, Yuegang, Feng, Yan, Chen, Chun, Wang, Can
In the realm of recommender systems (RS), Top-$K$ ranking metrics such as NDCG@$K$ are the gold standard for evaluating recommendation performance. However, during the training of recommendation models, optimizing NDCG@$K$ poses significant challenges due to its inherent discontinuous nature and the intricate Top-$K$ truncation. Recent efforts to optimize NDCG@$K$ have either overlooked the Top-$K$ truncation or suffered from high computational costs and training instability. To overcome these limitations, we propose SoftmaxLoss@$K$ (SL@$K$), a novel recommendation loss tailored for NDCG@$K$ optimization. Specifically, we integrate the quantile technique to handle Top-$K$ truncation and derive a smooth upper bound for optimizing NDCG@$K$ to address discontinuity. The resulting SL@$K$ loss has several desirable properties, including theoretical guarantees, ease of implementation, computational efficiency, gradient stability, and noise robustness. Extensive experiments on four real-world datasets and three recommendation backbones demonstrate that SL@$K$ outperforms existing losses with a notable average improvement of 6.03%. The code is available at https://github.com/Tiny-Snow/IR-Benchmark.
Optimizing Preference Alignment with Differentiable NDCG Ranking
Zhou, Jiacong, Wang, Xianyun, Yu, Jun
Aligning large language models with human preferences improves interaction quality and safety by ensuring outputs better reflect human values. A promising strategy involves Reinforcement Learning from Human Feedback (RLHF), starting with collecting and ranking responses generated by a supervised fine-tuning model to refine alignment. Current methods (DPO) focus on learning from pairwise preference data, categorizing responses into preferred and less preferred pairs, and optimizing by maximizing pairwise margins. Recent studies have uncovered a substantial discrepancy between the theoretical aspirations of preference learning and its real-world results. Current preference alignment techniques underperform expectations, with ranking accuracies below 60% on standard datasets. This suggests existing methods inadequately capture ideal preference relationships within sequences. To address this challenge, this paper introduces Direct Ranking Preference Optimization (DRPO), a novel method that views human preference alignment as a Learning-to-Rank (LTR) task. DRPO leverages NDCG, a widely used LTR metric, to optimize the ranking of responses within lists based on preference data, thereby enhancing ranking accuracies. Due to the nondifferentiability of NDCG, we propose diffNDCG loss, a differentiable approximation facilitated by a sorting network to simulate NDCG. Furthermore, to improve the quality of generated response, we propose a novel margin-based Adaptive Rank Policy Score. Extensive experiments have shown that DRPO outperforms existing baseline methods, enhancing the quality of the generated responses. Large language models (LLMs), trained on extensive and diverse datasets, can be prompted to demonstrate impressive capabilities across a broad range of tasks (Huang et al., 2024; Chiang et al., 2023; OpenAI et al., 2024; Touvron et al., 2023). However, due to the varied nature of their training data, these models sometimes produce content that may not align with human preferences, including fabricated answers, offensive comments, or harmful responses (Bai et al., 2022; Wang et al., 2023). To ensure the development of AI systems that are safe and controllable, this paper investigates learning tasks for LLMs that guide them to generate responses in alignment with human preferences. Human preference alignment has become an active research area. Reinforcement Learning with Human Feedback (RLHF) (Ouyang et al., 2022) is the first proposed method in this area. However, the optimization process of RLHF is complex, and its implementation introduces challenges due to unstable and costly training. Recent studies (Hong et al., 2024; Ethayarajh et al., 2024) have started to adopt alternatives to RLHF.
Prediction-Powered Ranking of Large Language Models
Chatzi, Ivi, Straitouri, Eleni, Thejaswi, Suhas, Rodriguez, Manuel Gomez
Large language models are often ranked according to their level of alignment with human preferences -- a model is better than other models if its outputs are more frequently preferred by humans. One of the most popular ways to elicit human preferences utilizes pairwise comparisons between the outputs provided by different models to the same inputs. However, since gathering pairwise comparisons by humans is costly and time-consuming, it has become a very common practice to gather pairwise comparisons by a strong large language model -- a model strongly aligned with human preferences. Surprisingly, practitioners cannot currently measure the uncertainty that any mismatch between human and model preferences may introduce in the constructed rankings. In this work, we develop a statistical framework to bridge this gap. Given a small set of pairwise comparisons by humans and a large set of pairwise comparisons by a model, our framework provides a rank-set -- a set of possible ranking positions -- for each of the models under comparison. Moreover, it guarantees that, with a probability greater than or equal to a user-specified value, the rank-sets cover the true ranking consistent with (the distribution of) human pairwise preferences. Our framework is computationally efficient, easy to use, and does not make any assumption about the distribution of human preferences nor about the degree of alignment between the pairwise comparisons by the humans and the strong large language model.
Predicting FIFA 2022 World Cup with Machine Learning
With the approach of FIFA 2022 World Cup, the interest and discussions about which team is going to win the championship increase. Thus, I decided to test my Data Science and Machine Learning skills to develop a model analyzing historical data to simulate all games from FIFA 2022 World Cup. In order to build a Machine Learning model, we need the data from the teams. First, we need something that tells the performance of the teams, which can be extracted from previous games. Also, I decided to use the FIFA Rankings in the construction of the features.
How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Mishra, Swaroop, Arunkumar, Anjana
Models that top leaderboards often perform unsatisfactorily when deployed in real world applications; this has necessitated rigorous and expensive pre-deployment model testing. A hitherto unexplored facet of model performance is: Are our leaderboards doing equitable evaluation? In this paper, we introduce a task-agnostic method to probe leaderboards by weighting samples based on their `difficulty' level. We find that leaderboards can be adversarially attacked and top performing models may not always be the best models. We subsequently propose alternate evaluation metrics. Our experiments on 10 models show changes in model ranking and an overall reduction in previously reported performance -- thus rectifying the overestimation of AI systems' capabilities. Inspired by behavioral testing principles, we further develop a prototype of a visual analytics tool that enables leaderboard revamping through customization, based on an end user's focus area. This helps users analyze models' strengths and weaknesses, and guides them in the selection of a model best suited for their application scenario. In a user study, members of various commercial product development teams, covering 5 focus areas, find that our prototype reduces pre-deployment development and testing effort by 41% on average.
Block Randomized Optimization for Adaptive Hypergraph Learning
Karantaidis, Georgios, Sarridis, Ioannis, Kotropoulos, Constantine
The high-order relations between the content in social media sharing platforms are frequently modeled by a hypergraph. Either hypergraph Laplacian matrix or the adjacency matrix is a big matrix. Randomized algorithms are used for low-rank factorizations in order to approximately decompose and eventually invert such big matrices fast. Here, block randomized Singular Value Decomposition (SVD) via subspace iteration is integrated within adaptive hypergraph weight estimation for image tagging, as a first approach. Specifically, creating low-rank submatrices along the main diagonal by tessellation permits fast matrix inversions via randomized SVD. Moreover, a second approach is proposed for solving the linear system in the optimization problem of hypergraph learning by employing the conjugate gradient method. Both proposed approaches achieve high accuracy in image tagging measured by F1 score and succeed to reduce the computational requirements of adaptive hypergraph weight estimation.
Ranking Preserving Hashing for Fast Similarity Search
Wang, Qifan (Purdue University) | Zhang, Zhiwei (Purdue University) | Si, Luo (Purdue University)
Hashing method becomes popular for large scale similarity search due to its storage and computational efficiency. Many machine learning techniques, ranging from unsupervised to supervised, have been proposed to design compact hashing codes. Most of the existing hashing methods generate binary codes to efficiently find similar data examples to a query. However, the ranking accuracy among the retrieved data examples is not modeled. But in many real world applications, ranking measure is important for evaluating the quality of hashing codes.In this paper, we propose a novel Ranking Preserving Hashing (RPH) approach that directly optimizes a popular ranking measure, Normalized Discounted Cumulative Gain (NDCG), to obtain effective hashing codes with high ranking accuracy. The main difficulty in the direct optimization of NDCG measure is that it depends on the ranking order of data examples, which forms a non-convex non-smooth optimization problem. We address this challenge by optimizing the expectation of NDCG measure calculated based on a linear hashing function. A gradient descent method is designed to achieve the goal. An extensive set of experiments on two large scale datasets demonstrate the superior ranking performance of the proposed approach over several state-of-the-art hashing methods.