AITopics | ranking score

Collaborating Authors

ranking score

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ZooD: ExploitingModelZoofor Out-of-DistributionGeneralization

Neural Information Processing SystemsFeb-11-2026, 23:12:49 GMT

ZooD then continues with model aggregation consisting ofmodel ensemble andfeature selection.

artificial intelligence, dataset, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > Poland (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multi-domain performance analysis with scores tailored to user preferences

Piérard, Sébastien, Deliège, Adrien, Van Droogenbroeck, Marc

arXiv.org Artificial IntelligenceDec-10-2025

The performance of algorithms, methods, and models tends to depend heavily on the distribution of cases on which they are applied, this distribution being specific to the applicative domain. After performing an evaluation in several domains, it is highly informative to compute a (weighted) mean performance and, as shown in this paper, to scrutinize what happens during this averaging. To achieve this goal, we adopt a probabilistic framework and consider a performance as a probability measure (e.g., a normalized confusion matrix for a classification task). It appears that the corresponding weighted mean is known to be the summarization, and that only some remarkable scores assign to the summarized performance a value equal to a weighted arithmetic mean of the values assigned to the domain-specific performances. These scores include the family of ranking scores, a continuum parameterized by user preferences, and that the weights to consider in the arithmetic mean depend on the user preferences. Based on this, we rigorously define four domains, named easiest, most difficult, preponderant, and bottleneck domains, as functions of user preferences. After establishing the theory in a general setting, regardless of the task, we develop new visual tools for two-class classification.

artificial intelligence, machine learning, user preference, (16 more...)

arXiv.org Artificial Intelligence

2512.08715

Country:

Europe > Belgium (0.15)
North America > United States (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

Adversarial Ranking for Language Generation

Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, Ming-ting Sun

Neural Information Processing SystemsNov-21-2025, 12:38:19 GMT

Generative adversarial networks (GANs) have great successes on synthesizing data.

generator, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

SelfCheckAgent: Zero-Resource Hallucination Detection in Generative Large Language Models

Muhammed, Diyana, Rabby, Gollam, Auer, Sören

arXiv.org Artificial IntelligenceFeb-3-2025

Detecting hallucinations in Large Language Models (LLMs) remains a critical challenge for their reliable deployment in real-world applications. To address this, we introduce SelfCheckAgent, a novel framework integrating three different agents: the Symbolic Agent, the Specialized Detection Agent, and the Contextual Consistency Agent. These agents provide a robust multi-dimensional approach to hallucination detection. Notable results include the Contextual Consistency Agent leveraging Llama 3.1 with Chain-of-Thought (CoT) to achieve outstanding performance on the WikiBio dataset, with NonFactual hallucination detection scoring 93.64%, Factual 70.26%, and Ranking 78.48% respectively. On the AIME dataset, GPT-4o with CoT excels in NonFactual detection with 94.89% but reveals trade-offs in Factual with 30.58% and Ranking with 30.68%, underscoring the complexity of hallucination detection in the complex mathematical domains. The framework also incorporates a triangulation strategy, which increases the strengths of the SelfCheckAgent, yielding significant improvements in real-world hallucination identification. The comparative analysis demonstrates SelfCheckAgent's applicability across diverse domains, positioning it as a crucial advancement for trustworthy LLMs. These findings highlight the potentiality of consistency-driven methodologies in detecting hallucinations in LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.01812

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.14)
Europe > Germany > Lower Saxony > Hanover (0.14)
South America > Colombia > Meta Department > Villavicencio (0.04)
(15 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Foundations of the Theory of Performance-Based Ranking

Piérard, Sébastien, Halin, Anaïs, Cioppa, Anthony, Deliège, Adrien, Van Droogenbroeck, Marc

arXiv.org Artificial IntelligenceDec-18-2024

Ranking entities such as algorithms, devices, methods, or models based on their performances, while accounting for application-specific preferences, is a challenge. To address this challenge, we establish the foundations of a universal theory for performance-based ranking. First, we introduce a rigorous framework built on top of both the probability and order theories. Our new framework encompasses the elements necessary to (1) manipulate performances as mathematical objects, (2) express which performances are worse than or equivalent to others, (3) model tasks through a variable called satisfaction, (4) consider properties of the evaluation, (5) define scores, and (6) specify application-specific preferences through a variable called importance. On top of this framework, we propose the first axiomatic definition of performance orderings and performance-based rankings. Then, we introduce a universal parametric family of scores, called ranking scores, that can be used to establish rankings satisfying our axioms, while considering application-specific preferences. Finally, we show, in the case of two-class classification, that the family of ranking scores encompasses well-known performance scores, including the accuracy, the true positive rate (recall, sensitivity), the true negative rate (specificity), the positive predictive value (precision), and F1. However, we also show that some other scores commonly used to compare classifiers are unsuitable to derive performance orderings satisfying the axioms. Therefore, this paper provides the computer vision and machine learning communities with a rigorous framework for evaluating and ranking entities.

artificial intelligence, machine learning, preorder, (17 more...)

arXiv.org Artificial Intelligence

2412.04227

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

A Hitchhiker's Guide to Understanding Performances of Two-Class Classifiers

Halin, Anaïs, Piérard, Sébastien, Cioppa, Anthony, Van Droogenbroeck, Marc

arXiv.org Artificial IntelligenceDec-18-2024

Properly understanding the performances of classifiers is essential in various scenarios. However, the literature often relies only on one or two standard scores to compare classifiers, which fails to capture the nuances of application-specific requirements, potentially leading to suboptimal classifier selection. Recently, a paper on the foundations of the theory of performance-based ranking introduced a tool, called the Tile, that organizes an infinity of ranking scores into a 2D map. Thanks to the Tile, it is now possible to evaluate and compare classifiers efficiently, displaying all possible application-specific preferences instead of having to rely on a pair of scores. In this paper, we provide a first hitchhiker's guide for understanding the performances of two-class classifiers by presenting four scenarios, each showcasing a different user profile: a theoretical analyst, a method designer, a benchmarker, and an application developer. Particularly, we show that we can provide different interpretative flavors that are adapted to the user's needs by mapping different values on the Tile. As an illustration, we leverage the newly introduced Tile tool and the different flavors to rank and analyze the performances of 74 state-of-the-art semantic segmentation models in two-class classification through the eyes of the four user profiles. Through these user profiles, we demonstrate that the Tile effectively captures the behavior of classifiers in a single visualization, while accommodating an infinite number of ranking scores.

artificial intelligence, machine learning, npv tpr npv tpr 0, (10 more...)

arXiv.org Artificial Intelligence

2412.04377

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(15 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)

Add feedback

The Tile: A 2D Map of Ranking Scores for Two-Class Classification

Piérard, Sébastien, Halin, Anaïs, Cioppa, Anthony, Deliège, Adrien, Van Droogenbroeck, Marc

arXiv.org Artificial IntelligenceDec-18-2024

In the computer vision and machine learning communities, as well as in many other research domains, rigorous evaluation of any new method, including classifiers, is essential. One key component of the evaluation process is the ability to compare and rank methods. However, ranking classifiers and accurately comparing their performances, especially when taking application-specific preferences into account, remains challenging. For instance, commonly used evaluation tools like Receiver Operating Characteristic (ROC) and Precision/Recall (PR) spaces display performances based on two scores. Hence, they are inherently limited in their ability to compare classifiers across a broader range of scores and lack the capability to establish a clear ranking among classifiers. In this paper, we present a novel versatile tool, named the Tile, that organizes an infinity of ranking scores in a single 2D map for two-class classifiers, including common evaluation scores such as the accuracy, the true positive rate, the positive predictive value, Jaccard's coefficient, and all F-beta scores. Furthermore, we study the properties of the underlying ranking scores, such as the influence of the priors or the correspondences with the ROC space, and depict how to characterize any other score by comparing them to the Tile. Overall, we demonstrate that the Tile is a powerful tool that effectively captures all the rankings in a single visualization and allows interpreting them.

artificial intelligence, machine learning, ranking score, (18 more...)

arXiv.org Artificial Intelligence

2412.04309

Country:

North America > United States (0.04)
Europe > Belgium > Wallonia > Liège Province > Liège (0.04)
Asia > Middle East > Republic of Türkiye > Antalya Province > Antalya (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.49)

Industry: Health & Medicine > Diagnostic Medicine (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Adversarial Ranking for Language Generation

Kevin Lin, Dianqi Li, Xiaodong He, Zhengyou Zhang, Ming-ting Sun

Neural Information Processing SystemsOct-4-2024, 04:14:03 GMT

Neural Information Processing Systems http://nips.cc/

discriminator, generator, rankgan, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers

Chen, Shijie, Gutiérrez, Bernal Jiménez, Su, Yu

arXiv.org Artificial IntelligenceOct-3-2024

Information retrieval (IR) systems have played a vital role in modern digital life and have cemented their continued usefulness in this new era of generative AI via retrieval-augmented generation. With strong language processing capabilities and remarkable versatility, large language models (LLMs) have become popular choices for zero-shot re-ranking in IR systems. So far, LLM-based re-ranking methods rely on strong generative capabilities, which restricts their use to either specialized or powerful proprietary models. Given these restrictions, we ask: is autoregressive generation necessary and optimal for LLMs to perform re-ranking? We hypothesize that there are abundant signals relevant to re-ranking within LLMs that might not be used to their full potential via generation. To more directly leverage such signals, we propose in-context re-ranking (ICR), a novel method that leverages the change in attention pattern caused by the search query for accurate and efficient re-ranking. To mitigate the intrinsic biases in LLMs, we propose a calibration method using a content-free query. Due to the absence of generation, ICR only requires two ($O(1)$) forward passes to re-rank $N$ documents, making it substantially more efficient than generative re-ranking methods that require at least $O(N)$ forward passes. Our novel design also enables ICR to be applied to any LLM without specialized training while guaranteeing a well-formed ranking. Extensive experiments with two popular open-weight LLMs on standard single-hop and multi-hop information retrieval benchmarks show that ICR outperforms RankGPT while cutting the latency by more than 60% in practice. Through detailed analyses, we show that ICR's performance is specially strong on tasks that require more complex re-ranking signals. Our findings call for further exploration on novel ways of utilizing open-weight LLMs beyond text generation.

icr, query, rankgpt, (12 more...)

arXiv.org Artificial Intelligence

2410.02642

Country:

Europe > Greece (0.28)
Europe > Germany (0.14)
Europe > Ukraine (0.04)
(26 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Media > Film (1.00)
Leisure & Entertainment > Sports > Basketball (1.00)
Leisure & Entertainment > Sports > Baseball (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Metalearners for Ranking Treatment Effects

Vanderschueren, Toon, Verbeke, Wouter, Moraes, Felipe, Proença, Hugo Manuel

arXiv.org Machine LearningMay-3-2024

Efficiently allocating treatments with a budget constraint constitutes an important challenge across various domains. In marketing, for example, the use of promotions to target potential customers and boost conversions is limited by the available budget. While much research focuses on estimating causal effects, there is relatively limited work on learning to allocate treatments while considering the operational context. Existing methods for uplift modeling or causal inference primarily estimate treatment effects, without considering how this relates to a profit maximizing allocation policy that respects budget constraints. The potential downside of using these methods is that the resulting predictive model is not aligned with the operational context. Therefore, prediction errors are propagated to the optimization of the budget allocation problem, subsequently leading to a suboptimal allocation policy. We propose an alternative approach based on learning to rank. Our proposed methodology directly learns an allocation policy by prioritizing instances in terms of their incremental profit. We propose an efficient sampling procedure for the optimization of the ranking model to scale our methodology to large-scale data sets. Theoretically, we show how learning to rank can maximize the area under a policy's incremental profit curve. Empirically, we validate our methodology and show its effectiveness in practice through a series of experiments on both synthetic and real-world data.

metalearner, objective, treatment effect, (14 more...)

arXiv.org Machine Learning

2405.02183

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
North America > Montserrat (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.93)
Health & Medicine > Epidemiology (0.93)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback