AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.65)

Salgado, Henry, Kendall, Meagan R., Ceberio, Martine

Does the Model Say What the Data Says? A Simple Heuristic for Model Data Alignment

arXiv.org Artificial IntelligenceDec-9-2025

In this work, we propose a simple and computationally efficient framework for evaluating whether machine learning models align with the structure of the data they learn from; that is, whether the model says what the data says. Unlike existing interpretability methods that focus exclusively on explaining model behavior, our approach establishes a baseline derived directly from the data itself. Drawing inspiration from Rubin's Potential Outcomes Framework, we quantify how strongly each feature separates the two outcome groups in a binary classification task, moving beyond traditional descriptive statistics to estimate each feature's effect on the outcome. By comparing these data-derived feature rankings with model-based explanations, we provide practitioners with an interpretable and model-agnostic method for assessing model-data alignment.

alignment, artificial intelligence, machine learning, (16 more...)

2511.21931

Country:

North America > United States > Texas (0.15)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (0.98)
Health & Medicine > Diagnostic Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Anderson, Joshua Wolff, Visweswaran, Shyam

The Effect of Enforcing Fairness on Reshaping Explanations in Machine Learning Models

arXiv.org Artificial IntelligenceDec-3-2025

Trustworthy machine learning in healthcare requires strong predictive performance, fairness, and explanations. While it is known that improving fairness can affect predictive performance, little is known about how fairness improvements influence explainability, an essential ingredient for clinical trust. Clinicians may hesitate to rely on a model whose explanations shift after fairness constraints are applied. In this study, we examine how enhancing fairness through bias mitigation techniques reshapes Shapley-based feature rankings. We quantify changes in feature importance rankings after applying fairness constraints across three datasets: pediatric urinary tract infection risk, direct anticoagulant bleeding risk, and recidivism risk. We also evaluate multiple model classes on the stability of Shapley-based rankings. We find that increasing model fairness across racial subgroups can significantly alter feature importance rankings, sometimes in different ways across groups. These results highlight the need to jointly consider accuracy, fairness, and explainability in model assessment rather than in isolation.

artificial intelligence, dataset, machine learning, (16 more...)

2512.02265

Country: North America > United States (0.46)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.70)

Curletto, Chiara, Massa, Paolo, Tagliafico, Valeria, Campi, Cristina, Benvenuto, Federico, Piana, Michele, Tacchino, Andrea

PRESOL: a web-based computational setting for feature-based flare forecasting

arXiv.org Artificial IntelligenceOct-3-2025

Solar flares are the most explosive phenomena in the solar system and the main trigger of the events' chain that starts from Coronal Mass Ejections and leads to geomagnetic storms with possible impacts on the infrastructures at Earth. Data-driven solar flare forecasting relies on either deep learning approaches, which are operationally promising but with a low explainability degree, or machine learning algorithms, which can provide information on the physical descriptors that mostly impact the prediction. This paper describes a web-based technological platform for the execution of a computational pipeline of feature-based machine learning methods that provide predictions of the flare occurrence, feature ranking information, and assessment of the prediction performances.

algorithm, artificial intelligence, machine learning, (19 more...)

2510.01799

Country:

North America > United States (0.47)
Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Neural Information Processing SystemsJan-25-2025, 18:28:57 GMT

Review for NeurIPS paper: Simple and Scalable Sparse k-means Clustering via Feature Ranking

Summary and Contributions: This paper focuses on the problem of clustering in high dimension. K-means clustering is an extremely popular tool (especially in biomedical applications). However, as underlined by the authors, its performance is severely hindered in high-dimensional space --- leaving the data analyst no chance but to (a) apply some dimensionality reduction technique before performing the clustering or (b) selecting the features that are the most informative for the clustering and apply k-means on a subset of the features. This paper proposes a version of the later approach, choosing a sparse and interpretable subset of features. The setting is the following.

feature ranking, scalable sparse k-means clustering, simple and scalable sparse, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.64)

Neural Information Processing SystemsJan-25-2025, 18:28:49 GMT

Review for NeurIPS paper: Simple and Scalable Sparse k-means Clustering via Feature Ranking

The reviewers appreciate the algorithmic contributions of this paper and believe it will be on interest to the community.

feature ranking, scalable sparse k-means clustering, simple and scalable sparse, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Neural Information Processing SystemsOct-10-2024, 12:58:22 GMT

Simple and Scalable Sparse k-means Clustering via Feature Ranking

feature ranking, scalable sparse k-means clustering, simple and scalable sparse, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Zaccaria, Valentina, Dandolo, David, Masiero, Chiara, Susto, Gian Antonio

AcME-AD: Accelerated Model Explanations for Anomaly Detection

arXiv.org Artificial IntelligenceMar-2-2024

Pursuing fast and robust interpretability in Anomaly Detection is crucial, especially due to its significance in practical applications. Traditional Anomaly Detection methods excel in outlier identification but are often black-boxes, providing scant insights into their decision-making process. This lack of transparency compromises their reliability and hampers their adoption in scenarios where comprehending the reasons behind anomaly detection is vital. At the same time, getting explanations quickly is paramount in practical scenarios. To bridge this gap, we present AcME-AD, a novel approach rooted in Explainable Artificial Intelligence principles, designed to clarify Anomaly Detection models for tabular data. AcME-AD transcends the constraints of model-specific or resource-heavy explainability techniques by delivering a model-agnostic, efficient solution for interoperability. It offers local feature importance scores and a what-if analysis tool, shedding light on the factors contributing to each anomaly, thus aiding root cause analysis and decision-making. This paper elucidates AcME-AD's foundation, its benefits over existing methods, and validates its effectiveness with tests on both synthetic and real datasets. AcME-AD's implementation and experiment replication code is accessible in a public repository.

acme-ad, anomaly score, explanation, (14 more...)

2403.01245

Country:

Europe > Italy (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > Promising Solution (0.66)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence (1.00)

arXiv.org Artificial IntelligenceSep-20-2023

Combining low-dose CT-based radiomics and metabolomics for early lung cancer screening support

Zyla, Joanna, Marczyk, Michal, Prazuch, Wojciech, Socha, Marek, Suwalska, Aleksandra, Durawa, Agata, Jelitto-Gorska, Malgorzata, Dziadziuszko, Katarzyna, Szurowska, Edyta, Rzyman, Witold, Widlak, Piotr, Polanska, Joanna

Due to its predominantly asymptomatic or mildly symptomatic progression, lung cancer is often diagnosed in advanced stages, resulting in poorer survival rates for patients. As with other cancers, early detection significantly improves the chances of successful treatment. Early diagnosis can be facilitated through screening programs designed to detect lung tissue tumors when they are still small, typically around 3mm in size. However, the analysis of extensive screening program data is hampered by limited access to medical experts. In this study, we developed a procedure for identifying potential malignant neoplastic lesions within lung parenchyma. The system leverages machine learning (ML) techniques applied to two types of measurements: low-dose Computed Tomography-based radiomics and metabolomics. Using data from two Polish screening programs, two ML algorithms were tested, along with various integration methods, to create a final model that combines both modalities to support lung cancer screening.

lung cancer, modality, procedure, (10 more...)

2311.1281

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.14)
Europe > Poland > Pomerania Province > Gdańsk (0.05)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.89)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Škrlj, Blaž, Mramor, Blaž

OutRank: Speeding up AutoML-based Model Search for Large Sparse Data sets with Cardinality-aware Feature Ranking

arXiv.org Artificial IntelligenceSep-4-2023

The design of modern recommender systems relies on understanding which parts of the feature space are relevant for solving a given recommendation task. However, real-world data sets in this domain are often characterized by their large size, sparsity, and noise, making it challenging to identify meaningful signals. Feature ranking represents an efficient branch of algorithms that can help address these challenges by identifying the most informative features and facilitating the automated search for more compact and better-performing models (AutoML). We introduce OutRank, a system for versatile feature ranking and data quality-related anomaly detection. OutRank was built with categorical data in mind, utilizing a variant of mutual information that is normalized with regard to the noise produced by features of the same cardinality. We further extend the similarity measure by incorporating information on feature similarity and combined relevance. The proposed approach's feasibility is demonstrated by speeding up the state-of-the-art AutoML system on a synthetic data set with no performance loss. Furthermore, we considered a real-life click-through-rate prediction data set where it outperformed strong baselines such as random forest-based approaches. The proposed approach enables exploration of up to 300% larger feature spaces compared to AutoML-only approaches, enabling faster search for better models on off-the-shelf hardware.

automl, interaction, outrank, (15 more...)

2309.01552

Country: North America > United States > New York > New York County > New York City (0.05)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.70)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.54)