AITopics | quality measure

Collaborating Authors

quality measure

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Clustering Redemption–Beyond the Impossibility of Kleinberg’s Axioms

Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn

Neural Information Processing SystemsFeb-13-2026, 04:01:02 GMT

Neural Information Processing Systems http://nips.cc/

axiom, kleinberg, partition, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.97)

Add feedback

Picking a Representative Set of Solutions in Multiobjective Optimization: Axioms, Algorithms, and Experiments

Boehmer, Niclas, Wittmann, Maximilian T.

arXiv.org Artificial IntelligenceNov-17-2025

Many real-world decision-making problems involve optimizing multiple objectives simultaneously, rendering the selection of the most preferred solution a non-trivial problem: All Pareto optimal solutions are viable candidates, and it is typically up to a decision maker to select one for implementation based on their subjective preferences. To reduce the cognitive load on the decision maker, previous work has introduced the Pareto pruning problem, where the goal is to compute a fixed-size subset of Pareto optimal solutions that best represent the full set, as evaluated by a given quality measure. Reframing Pareto pruning as a multiwinner voting problem, we conduct an axiomatic analysis of existing quality measures, uncovering several unintuitive behaviors. Motivated by these findings, we introduce a new measure, directed coverage. We also analyze the computational complexity of optimizing various quality measures, identifying previously unknown boundaries between tractable and intractable cases depending on the number and structure of the objectives. Finally, we present an experimental evaluation, demonstrating that the choice of quality measure has a decisive impact on the characteristics of the selected set of solutions and that our proposed measure performs competitively or even favorably across a range of settings.

artificial intelligence, machine learning, objective, (16 more...)

arXiv.org Artificial Intelligence

2511.10716

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Brandenburg > Potsdam (0.04)
Europe > France (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Measuring Sample Quality with Stein's Method

Jackson Gorham, Lester Mackey

Neural Information Processing SystemsOct-2-2025, 08:16:54 GMT

Neural Information Processing Systems http://nips.cc/

discrepancy, sequence, stein discrepancy, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Conformalized Exceptional Model Mining: Telling Where Your Model Performs (Not) Well

Du, Xin, Yang, Sikun, Duivesteijn, Wouter, Pechenizkiy, Mykola

arXiv.org Artificial IntelligenceAug-22-2025

Understanding the nuanced performance of machine learning models is essential for responsible deployment, especially in high-stakes domains like healthcare and finance. This paper introduces a novel framework, Conformalized Exceptional Model Mining, which combines the rigor of Conformal Prediction with the explanatory power of Exceptional Model Mining (EMM). The proposed framework identifies cohesive subgroups within data where model performance deviates exceptionally, highlighting regions of both high confidence and high uncertainty. We develop a new model class, mSMoPE (multiplex Soft Model Performance Evaluation), which quantifies uncertainty through conformal prediction's rigorous coverage guarantees. By defining a new quality measure, Relative Average Uncertainty Loss (RAUL), our framework isolates subgroups with exceptional performance patterns in multi-class classification and regression tasks. Experimental results across diverse datasets demonstrate the framework's effectiveness in uncovering interpretable subgroups that provide critical insights into model behavior. This work lays the groundwork for enhancing model interpretability and reliability, advancing the state-of-the-art in explainable AI and uncertainty quantification.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.15569

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Pattern-Based Graph Classification: Comparison of Quality Measures and Importance of Preprocessing

Potin, Lucas, Figueiredo, Rosa, Labatut, Vincent, Largeron, Christine

arXiv.org Artificial IntelligenceJul-23-2025

Graph classification aims to categorize graphs based on their structural and attribute features, with applications in diverse fields such as social network analysis and bioinformatics. Among the methods proposed to solve this task, those relying on patterns (i.e. subgraphs) provide good explainability, as the patterns used for classification can be directly interpreted. To identify meaningful patterns, a standard approach is to use a quality measure, i.e. a function that evaluates the discriminative power of each pattern. However, the literature provides tens of such measures, making it difficult to select the most appropriate for a given application. Only a handful of surveys try to provide some insight by comparing these measures, and none of them specifically focuses on graphs. This typically results in the systematic use of the most widespread measures, without thorough evaluation. To address this issue, we present a comparative analysis of 38 quality measures from the literature. We characterize them theoretically, based on four mathematical properties. We leverage publicly available datasets to constitute a benchmark, and propose a method to elaborate a gold standard ranking of the patterns. We exploit these resources to perform an empirical comparison of the measures, both in terms of pattern ranking and classification performance. Moreover, we propose a clustering-based preprocessing step, which groups patterns appearing in the same graphs to enhance classification performance. Our experimental results demonstrate the effectiveness of this step, reducing the number of patterns to be processed while achieving comparable performance. Additionally, we show that some popular measures widely used in the literature are not associated with the best results.

data mining, machine learning, pattern recognition, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3743143

2507.00039

Country:

Europe > France (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Kansas (0.04)
Asia > India (0.04)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.95)
Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback

Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian

Niculae, Andrei, Cosma, Adrian, Dumitrache, Cosmin, Rǎdoi, Emilian

arXiv.org Artificial IntelligenceJul-22-2025

Text-based telemedicine has become increasingly common, yet the quality of medical advice in doctor-patient interactions is often judged more on how advice is communicated rather than its clinical accuracy. To address this, we introduce Dr. Copilot , a multi-agent large language model (LLM) system that supports Romanian-speaking doctors by evaluating and enhancing the presentation quality of their written responses. Rather than assessing medical correctness, Dr. Copilot provides feedback along 17 interpretable axes. The system comprises of three LLM agents with prompts automatically optimized via DSPy. Designed with low-resource Romanian data and deployed using open-weight models, it delivers real-time specific feedback to doctors within a telemedicine platform. Empirical evaluations and live deployment with 41 doctors show measurable improvements in user reviews and response quality, marking one of the first real-world deployments of LLMs in Romanian medical settings.

copilot, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2507.11299

Country:

North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
(2 more...)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Health & Medicine > Health Care Technology > Telehealth (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Evaluate with the Inverse: Efficient Approximation of Latent Explanation Quality Distribution

Eiras-Franco, Carlos, Hedström, Anna, Höhne, Marina M. -C.

arXiv.org Artificial IntelligenceFeb-21-2025

Obtaining high-quality explanations of a model's output enables developers to identify and correct biases, align the system's behavior with human values, and ensure ethical compliance. Explainable Artificial Intelligence (XAI) practitioners rely on specific measures to gauge the quality of such explanations. These measures assess key attributes, such as how closely an explanation aligns with a model's decision process (faithfulness), how accurately it pinpoints the relevant input features (localization), and its consistency across different cases (robustness). Despite providing valuable information, these measures do not fully address a critical practitioner's concern: how does the quality of a given explanation compare to other potential explanations? Traditionally, the quality of an explanation has been assessed by comparing it to a randomly generated counterpart. This paper introduces an alternative: the Quality Gap Estimate (QGE). The QGE method offers a direct comparison to what can be viewed as the `inverse' explanation, one that conceptually represents the antithesis of the original explanation. Our extensive testing across multiple model architectures, datasets, and established quality metrics demonstrates that the QGE method is superior to the traditional approach. Furthermore, we show that QGE enhances the statistical reliability of these quality assessments. This advance represents a significant step toward a more insightful evaluation of explanations that enables a more effective inspection of a model's behavior.

dataset, explanation, qge, (16 more...)

arXiv.org Artificial Intelligence

2502.15403

Country:

Europe > Italy > Marche > Ancona Province > Ancona (0.04)
North America > United States > Maryland > Baltimore (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
Europe > Germany > Brandenburg > Potsdam (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.88)

Add feedback

ProReco: A Process Discovery Recommender System

Huang, Tsung-Hao, Junied, Tarek, Pegoraro, Marco, van der Aalst, Wil M. P.

arXiv.org Artificial IntelligenceFeb-14-2025

Process discovery aims to automatically derive process models from historical execution data (event logs). While various process discovery algorithms have been proposed in the last 25 years, there is no consensus on a dominating discovery algorithm. Selecting the most suitable discovery algorithm remains a challenge due to competing quality measures and diverse user requirements. Manually selecting the most suitable process discovery algorithm from a range of options for a given event log is a time-consuming and error-prone task. This paper introduces ProReco, a Process discovery Recommender system designed to recommend the most appropriate algorithm based on user preferences and event log characteristics. ProReco incorporates state-of-the-art discovery algorithms, extends the feature pools from previous work, and utilizes eXplainable AI (XAI) techniques to provide explanations for its recommendations.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-61000-4_11

2502.1023

Country: Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.97)

Add feedback

The Role of Natural Language Processing Tasks in Automatic Literary Character Network Construction

Amalvy, Arthur, Labatut, Vincent, Dufour, Richard

arXiv.org Artificial IntelligenceDec-16-2024

The automatic extraction of character networks from literary texts is generally carried out using natural language processing (NLP) cascading pipelines. While this approach is widespread, no study exists on the impact of low-level NLP tasks on their performance. In this article, we conduct such a study on a literary dataset, focusing on the role of named entity recognition (NER) and coreference resolution when extracting co-occurrence networks. To highlight the impact of these tasks' performance, we start with gold-standard annotations, progressively add uniformly distributed errors, and observe their impact in terms of character network quality. We demonstrate that NER performance depends on the tested novel and strongly affects character detection. We also show that NER-detected mentions alone miss a lot of character co-occurrences, and that coreference resolution is needed to prevent this. Finally, we present comparison points with 2 methods based on large language models (LLMs), including a fully end-to-end one, and show that these models are outperformed by traditional NLP pipelines in terms of recall.

artificial intelligence, large language model, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.1156

Country: Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback

Robustness investigation of quality measures for the assessment of machine learning models

Most, Thomas, Gräning, Lars, Wolff, Sebastian

arXiv.org Artificial IntelligenceAug-8-2024

In this paper the accuracy and robustness of quality measures for the assessment of machine learning models are investigated. The prediction quality of a machine learning model is evaluated model-independent based on a cross-validation approach, where the approximation error is estimated for unknown data. The presented measures quantify the amount of explained variation in the model prediction. The reliability of these measures is assessed by means of several numerical examples, where an additional data set for the verification of the estimated prediction error is available. Furthermore, the confidence bounds of the presented quality measures are estimated and local quality measures are derived from the prediction residuals obtained by the cross-validation approach.

approximation model, quality measure, support point, (16 more...)

arXiv.org Artificial Intelligence

2408.04391

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany (0.05)
Europe > United Kingdom > England (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback