AITopics | ranking system

Collaborating Authors

ranking system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Design Space for Explainable Ranking and Ranking Models

Hazwani, I. Al, Schmid, J., Sachdeva, M., Bernard, J.

arXiv.org Artificial IntelligenceSep-3-2025

Item ranking systems support users in multi-criteria decision-making tasks. Users need to trust rankings and ranking algorithms to reflect user preferences nicely while avoiding systematic errors and biases. However, today only few approaches help end users, model developers, and analysts to explain rankings. We report on the study of explanation approaches from the perspectives of recommender systems, explainable AI, and visualization research and propose the first cross-domain design space for explainers of item rankings. In addition, we leverage the descriptive power of the design space to characterize a) existing explainers and b) three main user groups involved in ranking explanation tasks. The generative power of the design space is a means for future designers and developers to create more target-oriented solutions in this only weakly exploited space.

artificial intelligence, design space, machine learning, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.2312/evp.20221114

2205.15305

Country: Europe > Switzerland > Zürich > Zürich (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Dropping Just a Handful of Preferences Can Change Top Large Language Model Rankings

Huang, Jenny Y., Shen, Yunyi, Wei, Dennis, Broderick, Tamara

arXiv.org Machine LearningAug-19-2025

We propose a method for evaluating the robustness of a widely used LLM ranking system -- the Bradley--Terry ranking system -- to dropping a worst-case very small fraction of evaluation data. Our approach is computationally fast and easy to adopt. When we apply our method to matchups from two popular human-preference platforms, Chatbot Arena and MT-Bench, we find that the Bradley--Terry rankings of top-performing models are remarkably sensitive to the removal of a small fraction of evaluations. Our framework also identifies the specific evaluations most responsible for such ranking flips, allowing for inspections of these influential preferences. We observe that the rankings derived from MT-Bench preferences are notably more robust than those from Chatbot Arena, likely due to MT-bench's use of expert annotators and carefully constructed prompts. Finally, we find that rankings based on crowdsourced human-evaluated systems are just as sensitive as those based on LLM-as-a-judge evaluations, where in both, dropping as little as 0.02% of the total evaluations in the dataset can change the top-ranked model.

chatbot arena, large language model, machine learning, (20 more...)

arXiv.org Machine Learning

2508.11847

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Reinforcement Speculative Decoding for Fast Ranking

Du, Yingpeng, Wei, Tianjun, Sun, Zhu, Zhang, Jie

arXiv.org Artificial IntelligenceMay-28-2025

Large Language Models (LLMs) have been widely adopted in ranking systems such as information retrieval (IR) systems and recommender systems (RSs). To alleviate the latency of auto-regressive decoding, some studies explore the single (first) token decoding for ranking approximation, but they suffer from severe degradation in tail positions. Although speculative decoding (SD) methods can be a remedy with verification at different positions, they face challenges in ranking systems due to their left-to-right decoding paradigm. Firstly, ranking systems require strict latency constraints, but verification rounds in SD methods remain agnostic; Secondly, SD methods usually discard listwise ranking knowledge about unaccepted items in previous rounds, hindering future multi-token prediction, especially when candidate tokens are the unaccepted items. In this paper, we propose a Reinforcement Speculative Decoding method for fast ranking inference of LLMs. To meet the ranking systems' latency requirement, we propose an up-to-down decoding paradigm that employs an agent to iteratively modify the ranking sequence under a constrained budget. Specifically, we design a ranking-tailored policy optimization, actively exploring optimal multi-round ranking modification policy verified by LLMs via reinforcement learning (RL). To better approximate the target LLM under the constrained budget, we trigger the agent fully utilizing the listwise ranking knowledge about all items verified by LLMs across different rounds in RL, enhancing the modification policy of the agent. More importantly, we demonstrate the theoretical robustness and advantages of our paradigm and implementation. Experiments on both IR and RS tasks show the effectiveness of our proposed method.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2505.20316

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Unified Knowledge-Distillation and Semi-Supervised Learning Framework to Improve Industrial Ads Delivery Systems

Eghbalzadeh, Hamid, Wang, Yang, Li, Rui, Mo, Yuji, Ding, Qin, Fu, Jiaxiang, Dai, Liang, Gu, Shuo, Noorshams, Nima, Park, Sem, Long, Bo, Feng, Xue

arXiv.org Artificial IntelligenceFeb-5-2025

Industrial ads ranking systems conventionally rely on labeled impression data, which leads to challenges such as overfitting, slower incremental gain from model scaling, and biases due to discrepancies between training and serving data. To overcome these issues, we propose a Unified framework for Knowledge-Distillation and Semi-supervised Learning (UKDSL) for ads ranking, empowering the training of models on a significantly larger and more diverse datasets, thereby reducing overfitting and mitigating training-serving data discrepancies. We provide detailed formal analysis and numerical simulations on the inherent miscalibration and prediction bias of multi-stage ranking systems, and show empirical evidence of the proposed framework's capability to mitigate those. Compared to prior work, UKDSL can enable models to learn from a much larger set of unlabeled data, hence, improving the performance while being computationally efficient. Finally, we report the successful deployment of UKDSL in an industrial setting across various ranking models, serving users at multi-billion scale, across various surfaces, geological locations, clients, and optimize for various events, which to the best of our knowledge is the first of its kind in terms of the scale and efficiency at which it operates.

artificial intelligence, machine learning, ranking system, (18 more...)

arXiv.org Artificial Intelligence

2502.06834

Country: North America > United States > New York > New York County > New York City (0.06)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.63)

Add feedback

Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat

Daynauth, Roland, Clarke, Christopher, Flautner, Krisztian, Tang, Lingjia, Mars, Jason

arXiv.org Artificial IntelligenceNov-19-2024

Deciding which large language model (LLM) to use is a complex challenge. Pairwise ranking has emerged as a new method for evaluating human preferences for LLMs. This approach entails humans evaluating pairs of model outputs based on a predefined criterion. By collecting these comparisons, a ranking can be constructed using methods such as Elo. However, applying these algorithms as constructed in the context of LLM evaluation introduces several challenges. In this paper, we explore the effectiveness of ranking systems for head-to-head comparisons of LLMs. We formally define a set of fundamental principles for effective ranking and conduct a series of extensive evaluations on the robustness of several ranking algorithms in the context of LLMs. Our analysis uncovers key insights into the factors that affect ranking accuracy and efficiency, offering guidelines for selecting the most appropriate methods based on specific evaluation contexts and resource constraints.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.14483

Country:

North America > United States > New York (0.04)
North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Leisure & Entertainment > Sports (0.68)
Leisure & Entertainment > Games > Chess (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

LLMEval: A Preliminary Study on How to Evaluate Large Language Models

Zhang, Yue, Zhang, Ming, Yuan, Haipeng, Liu, Shichun, Shi, Yongyao, Gui, Tao, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial IntelligenceDec-17-2023

Recently, the evaluation of Large Language Models has emerged as a popular area of research. The three crucial questions for LLM evaluation are ``what, where, and how to evaluate''. However, the existing research mainly focuses on the first two questions, which are basically what tasks to give the LLM during testing and what kind of knowledge it should deal with. As for the third question, which is about what standards to use, the types of evaluators, how to score, and how to rank, there hasn't been much discussion. In this paper, we analyze evaluation methods by comparing various criteria with both manual and automatic evaluation, utilizing onsite, crowd-sourcing, public annotators and GPT-4, with different scoring methods and ranking systems. We propose a new dataset, LLMEval and conduct evaluations on 20 LLMs. A total of 2,186 individuals participated, leading to the generation of 243,337 manual annotations and 57,511 automatic evaluation results. We perform comparisons and analyses of different settings and conduct 10 conclusions that can provide some insights for evaluating LLM in the future. The dataset and the results are publicly available at https://github.com/llmeval .

annotator, criteria, evaluation, (16 more...)

arXiv.org Artificial Intelligence

2312.07398

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Sports (0.46)
Leisure & Entertainment > Games > Chess (0.32)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

Add feedback

Towards the Better Ranking Consistency: A Multi-task Learning Framework for Early Stage Ads Ranking

Wang, Xuewei, Jin, Qiang, Huang, Shengyu, Zhang, Min, Liu, Xi, Zhao, Zhengli, Chen, Yukun, Zhang, Zhengyu, Yang, Jiyan, Wen, Ellie, Chordia, Sagar, Chen, Wenlin, Huang, Qin

arXiv.org Artificial IntelligenceJul-12-2023

Dividing ads ranking system into retrieval, early, and final stages is a common practice in large scale ads recommendation to balance the efficiency and accuracy. The early stage ranking often uses efficient models to generate candidates out of a set of retrieved ads. The candidates are then fed into a more computationally intensive but accurate final stage ranking system to produce the final ads recommendation. As the early and final stage ranking use different features and model architectures because of system constraints, a serious ranking consistency issue arises where the early stage has a low ads recall, i.e., top ads in the final stage are ranked low in the early stage. In order to pass better ads from the early to the final stage ranking, we propose a multi-task learning framework for early stage ranking to capture multiple final stage ranking components (i.e. ads clicks and ads quality events) and their task relations. With our multi-task learning framework, we can not only achieve serving cost saving from the model consolidation, but also improve the ads recall and ranking consistency. In the online A/B testing, our framework achieves significantly higher click-through rate (CTR), conversion rate (CVR), total value and better ads-quality (e.g. reduced ads cross-out rate) in a large scale industrial ads ranking system.

artificial intelligence, machine learning, multi-task learning framework, (14 more...)

arXiv.org Artificial Intelligence

2307.11096

Country:

North America > United States > California > Los Angeles County > Long Beach (0.05)
North America > United States > California > San Mateo County > Menlo Park (0.04)

Genre: Research Report (0.50)

Industry: Marketing (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models

Liu, Jiawei, Kang, Yangyang, Tang, Di, Song, Kaisong, Sun, Changlong, Wang, Xiaofeng, Lu, Wei, Liu, Xiaozhong

arXiv.org Artificial IntelligenceApr-18-2023

Neural text ranking models have witnessed significant advancement and are increasingly being deployed in practice. Unfortunately, they also inherit adversarial vulnerabilities of general neural models, which have been detected but remain underexplored by prior studies. Moreover, the inherit adversarial vulnerabilities might be leveraged by blackhat SEO to defeat better-protected search engines. In this study, we propose an imitation adversarial attack on black-box neural passage ranking models. We first show that the target passage ranking model can be transparentized and imitated by enumerating critical queries/candidates and then train a ranking imitation model. Leveraging the ranking imitation model, we can elaborately manipulate the ranking results and transfer the manipulation attack to the target ranking model. For this purpose, we propose an innovative gradient-based attack method, empowered by the pairwise objective function, to generate adversarial triggers, which causes premeditated disorderliness with very few tokens. To equip the trigger camouflages, we add the next sentence prediction loss and the language model fluency constraint to the objective function. Experimental results on passage ranking demonstrate the effectiveness of the ranking imitation attack model and adversarial triggers against various SOTA neural ranking models. Furthermore, various mitigation analyses and human evaluation show the effectiveness of camouflages when facing potential mitigation approaches. To motivate other scholars to further investigate this novel and important problem, we make the experiment data and code publicly available.

information retrieval, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2209.06506

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(24 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

Layout-aware Webpage Quality Assessment

Cheng, Anfeng, Liu, Yiding, Li, Weibin, Dong, Qian, Wang, Shuaiqiang, Huang, Zhengjie, Feng, Shikun, Cheng, Zhicong, Yin, Dawei

arXiv.org Artificial IntelligenceFeb-5-2023

Identifying high-quality webpages is fundamental for real-world search engines, which can fulfil users' information need with the less cognitive burden. Early studies of \emph{webpage quality assessment} usually design hand-crafted features that may only work on particular categories of webpages (e.g., shopping websites, medical websites). They can hardly be applied to real-world search engines that serve trillions of webpages with various types and purposes. In this paper, we propose a novel layout-aware webpage quality assessment model currently deployed in our search engine. Intuitively, layout is a universal and critical dimension for the quality assessment of different categories of webpages. Based on this, we directly employ the meta-data that describes a webpage, i.e., Document Object Model (DOM) tree, as the input of our model. The DOM tree data unifies the representation of webpages with different categories and purposes and indicates the layout of webpages. To assess webpage quality from complex DOM tree data, we propose a graph neural network (GNN) based method that extracts rich layout-aware information that implies webpage quality in an end-to-end manner. Moreover, we improve the GNN method with an attentive readout function, external web categories and a category-aware sampling method. We conduct rigorous offline and online experiments to show that our proposed solution is effective in real search engines, improving the overall usability and user experience.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2301.12152

Country:

North America > United States > California > Los Angeles County > Long Beach (0.05)
Asia > Myanmar > Tanintharyi Region > Dawei (0.05)
Asia > China > Beijing > Beijing (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.34)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Empirical complexity of comparator-based nearest neighbor descent

Baron, Jacob D., Darling, R. W. R.

arXiv.org Machine LearningJan-30-2022

A Java parallel streams implementation of the $K$-nearest neighbor descent algorithm is presented using a natural statistical termination criterion. Input data consist of a set $S$ of $n$ objects of type V, and a Function>, which enables any $x \in S$ to decide which of $y, z \in S\setminus\{x\}$ is more similar to $x$. Experiments with the Kullback-Leibler divergence Comparator support the prediction that the number of rounds of $K$-nearest neighbor updates need not exceed twice the diameter of the undirected version of a random regular out-degree $K$ digraph on $n$ vertices. Overall complexity was $O(n K^2 \log_K(n))$ in the class of examples studied. When objects are sampled uniformly from a $d$-dimensional simplex, accuracy of the $K$-nearest neighbor approximation is high up to $d = 20$, but declines in higher dimensions, as theory would predict.

algorithm, implementation, k-nn descent, (12 more...)

arXiv.org Machine Learning

2202.00517

Country:

North America > United States (0.28)
Asia > Afghanistan > Parwan Province > Charikar (0.05)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)

Add feedback