AITopics | ranking error

Collaborating Authors

ranking error

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?

Wang, Shaobo, Wang, Cong, Fu, Wenjie, Min, Yue, Feng, Mingquan, Guan, Isabel, Hu, Xuming, He, Conghui, Wang, Cunxiang, Yang, Kexin, Ren, Xingzhang, Huang, Fei, Liu, Dayiheng, Zhang, Linfeng

arXiv.org Artificial IntelligenceOct-14-2025

As the demand for comprehensive evaluations of diverse model capabilities steadily increases, benchmark suites have correspondingly grown significantly in scale. Despite notable advances in redundancy reduction and subset-level performance prediction, a systematic framework that effectively integrates these methods to ensure both prediction accuracy and ranking consistency is still largely elusive. In this paper, we first perform a sample-level analysis of benchmark redundancy and identify several highly similar samples that can be eliminated. Besides, we frame benchmark compression as an optimization problem with the aim of score reconstruction. Building on these, we then propose EssenceBench, a coarse-to-fine framework utilizing an iterative Genetic Algorithm (GA), which takes the advantages of fitness-based subset search and attribution-based sample search. Compared to previous methods, our approach yields superior compression results with lower reconstruction error and markedly higher efficiency. In particular, on the HellaSwag benchmark (10K samples), our method preserves the ranking of all models shifting within 5% using 25x fewer samples, and achieves 95% ranking preservation shifting within 5% using only 200x fewer samples.

large language model, machine learning, ranking error, (16 more...)

arXiv.org Artificial Intelligence

2510.10457

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Fragility-aware Classification for Understanding Risk and Improving Generalization

Yang, Chen, Cui, Zheng, Long, Daniel Zhuoyu, Qi, Jin, Zhan, Ruohan

arXiv.org Artificial IntelligenceFeb-18-2025

Classification models play a critical role in data-driven decision-making applications such as medical diagnosis, user profiling, recommendation systems, and default detection. Traditional performance metrics, such as accuracy, focus on overall error rates but fail to account for the confidence of incorrect predictions, thereby overlooking the risk of confident misjudgments. This risk is particularly significant in cost-sensitive and safety-critical domains like medical diagnosis and autonomous driving, where overconfident false predictions may cause severe consequences. To address this issue, we introduce the Fragility Index (FI), a novel metric that evaluates classification performance from a risk-averse perspective by explicitly capturing the tail risk of confident misjudgments. To enhance generalizability, we define FI within the robust satisficing (RS) framework, incorporating data uncertainty. We further develop a model training approach that optimizes FI while maintaining tractability for common loss functions. Specifically, we derive exact reformulations for cross-entropy loss, hinge-type loss, and Lipschitz loss, and extend the approach to deep learning models. Through synthetic experiments and real-world medical diagnosis tasks, we demonstrate that FI effectively identifies misjudgment risk and FI-based training improves model robustness and generalizability. Finally, we extend our framework to deep neural network training, further validating its effectiveness in enhancing deep learning models.

artificial intelligence, fragility-aware classification, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.13024

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.45)
Transportation > Ground > Road (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Rate-Optimal Rank Aggregation with Private Pairwise Rankings

Xu, Shirong, Sun, Will Wei, Cheng, Guang

arXiv.org Machine LearningFeb-26-2024

In various real-world scenarios like recommender systems and political surveys, pairwise rankings are commonly collected and utilized for rank aggregation to obtain an overall ranking of items. However, preference rankings can reveal individuals' personal preferences, underscoring the need to protect them before releasing for downstream analysis. In this paper, we address the challenge of preserving privacy while ensuring the utility of rank aggregation based on pairwise rankings generated from the Bradley-Terry-Luce (BTL) model. Using the randomized response mechanism to perturb raw pairwise rankings is a common privacy protection strategy used in practice, but a critical challenge arises because the privatized rankings no longer adhere to the BTL model, resulting in significant bias in downstream rank aggregation tasks. Motivated from this, we propose a debiased randomized response mechanism to protect the raw pairwise rankings, ensuring consistent estimation of true preferences and rankings in downstream rank aggregation. Theoretically, we offer insights into the relationship between overall privacy guarantees and estimation errors from private ranking data, and establish minimax rates for estimation errors. This enables the determination of optimal privacy guarantees that balance consistency in rank aggregation with robust privacy protection. We also investigate convergence rates of expected ranking errors for partial and full ranking recovery, quantifying how privacy protection influences the specification of top-$K$ item sets and complete rankings. Our findings are validated through extensive simulations and a real application.

estimation, mechanism, privacy, (16 more...)

arXiv.org Machine Learning

2402.16792

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Perturbation-Invariant Adversarial Training for Neural Ranking Models: Improving the Effectiveness-Robustness Trade-Off

Liu, Yu-An, Zhang, Ruqing, Zhang, Mingkun, Chen, Wei, de Rijke, Maarten, Guo, Jiafeng, Cheng, Xueqi

arXiv.org Artificial IntelligenceDec-16-2023

Neural ranking models (NRMs) have shown great success in information retrieval (IR). But their predictions can easily be manipulated using adversarial examples, which are crafted by adding imperceptible perturbations to legitimate documents. This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs. By incorporating adversarial examples into training data, adversarial training has become the de facto defense approach to adversarial attacks against NRMs. However, this defense mechanism is subject to a trade-off between effectiveness and adversarial robustness. In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs. We decompose the robust ranking error into two components, i.e., a natural ranking error for effectiveness evaluation and a boundary ranking error for assessing adversarial robustness. Then, we define the perturbation invariance of a ranking model and prove it to be a differentiable upper bound on the boundary ranking error for attainable computation. Informed by our theoretical analysis, we design a novel \emph{perturbation-invariant adversarial training} (PIAT) method for ranking models to achieve a better effectiveness-robustness trade-off. We design a regularized surrogate loss, in which one term encourages the effectiveness to be maximized while the regularization term encourages the output to be smooth, so as to improve adversarial robustness. Experimental results on several ranking models demonstrate the superiority of PITA compared to existing adversarial defenses.

boundary ranking error, ranking error, robustness, (14 more...)

arXiv.org Artificial Intelligence

2312.10329

Country:

Asia > Middle East > Jordan (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.35)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback