AITopics

2506.1954

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > Montana (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
(3 more...)

Chattopadhyay, Souradeep, Basulto-Elias, Guillermo, Chang, Jun Ha, Rizzo, Matthew, Hallmark, Shauna, Sharma, Anuj, Sarkar, Soumik

Predicting Mild Cognitive Impairment Using Naturalistic Driving and Trip Destination Modeling

arXiv.org Artificial IntelligenceJun-24-2025

Understanding the relationship between mild cognitive impairment (MCI) and driving behavior is essential for enhancing road safety, particularly among older adults. This study introduces a novel approach by incorporating specific trip destinations-such as home, work, medical appointments, social activities, and errands-using geohashing to analyze the driving habits of older drivers in Nebraska. We employed a two-fold methodology that combines data visualization with advanced machine learning models, including C5.0, Random Forest, and Support Vector Machines, to assess the effectiveness of these location-based variables in predicting cognitive impairment. Notably, the C5.0 model showed a robust and stable performance, achieving a median recall of 0.68, which indicates that our methodology accurately identifies cognitive impairment in drivers 68\% of the time. This emphasizes our model's capacity to reduce false negatives, a crucial factor given the profound implications of failing to identify impaired drivers. Our findings underscore the innovative use of life-space variables in understanding and predicting cognitive decline, offering avenues for early intervention and tailored support for affected individuals.

artificial intelligence, decision tree learning, machine learning, (17 more...)

2504.09027

Country: North America > United States > Nebraska (0.35)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

arXiv.org Machine LearningJun-23-2025

TRUST: Transparent, Robust and Ultra-Sparse Trees

Dorador, Albert

Piecewise-constant regression trees remain popular for their interpretability, yet often lag behind black-box models like Random Forest in predictive accuracy. In this work, we introduce TRUST (Transparent, Robust, and Ultra-Sparse Trees), a novel regression tree model that combines the accuracy of Random Forests with the interpretability of shallow decision trees and sparse linear models. TRUST further enhances transparency by leveraging Large Language Models to generate tailored, user-friendly explanations. Extensive validation on synthetic and real-world benchmark datasets demonstrates that TRUST consistently outperforms other interpretable models -- including CART, Lasso, and Node Harvest -- in predictive accuracy, while matching the accuracy of Random Forest and offering substantial gains in both accuracy and interpretability over M5', a well-established model that is conceptually related.

artificial intelligence, lasso, machine learning, (17 more...)

2506.15791

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > California (0.04)
Asia > Vietnam (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

arXiv.org Artificial IntelligenceJun-16-2025

Measuring multi-calibration

Guy, Ido, Haimovich, Daniel, Linder, Fridolin, Okati, Nastaran, Perini, Lorenzo, Tax, Niek, Tygert, Mark

A suitable scalar metric can help measure multi-calibration, defined as follows. When the expected values of observed responses are equal to corresponding predicted probabilities, the probabilistic predictions are known as "perfectly calibrated." When the predicted probabilities are perfectly calibrated simultaneously across several subpopulations, the probabilistic predictions are known as "perfectly multi-calibrated." In practice, predicted probabilities are seldom perfectly multi-calibrated, so a statistic measuring the distance from perfect multi-calibration is informative. A recently proposed metric for calibration, based on the classical Kuiper statistic, is a natural basis for a new metric of multi-calibration and avoids well-known problems of metrics based on binning or kernel density estimation. The newly proposed metric weights the contributions of different subpopulations in proportion to their signal-to-noise ratios; data analyses' ablations demonstrate that the metric becomes noisy when omitting the signal-to-noise ratios from the metric. Numerical examples on benchmark data sets illustrate the new metric.

calibration, data mining, machine learning, (16 more...)

2506.11251

Country:

North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.47)

Industry: Government > Regional Government > North America Government > United States Government (0.30)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Aguilera-Martínez, Francisco, Berzal, Fernando

Differential Privacy in Machine Learning: From Symbolic AI to LLMs

arXiv.org Artificial IntelligenceJun-16-2025

Machine learning models should not reveal particular information that is not otherwise accessible. Differential privacy provides a formal framework to mitigate privacy risks by ensuring that the inclusion or exclusion of any single data point does not significantly alter the output of an algorithm, thus limiting the exposure of private information. This survey paper explores the foundational definitions of differential privacy, reviews its original formulations and tracing its evolution through key research contributions. It then provides an in-depth examination of how DP has been integrated into machine learning models, analyzing existing proposals and methods to preserve privacy when training ML models. Finally, it describes how DP-based ML techniques can be evaluated in practice. %Finally, it discusses the broader implications of DP, highlighting its potential for public benefit, its real-world applications, and the challenges it faces, including vulnerabilities to adversarial attacks. By offering a comprehensive overview of differential privacy in machine learning, this work aims to contribute to the ongoing development of secure and responsible AI systems.

artificial intelligence, machine learning, neural information processing system, (20 more...)

2506.11687

Country:

North America > United States > New York > New York County > New York City (0.15)
Europe > Austria > Vienna (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.14)
(39 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.92)
Research Report > New Finding (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(6 more...)

Paul, Bidyarthi, Rahman, SM Musfiqur, Biswas, Dipta, Hasan, Md. Ziaul, Hossain, Md. Zahid

Analyzing Emotions in Bangla Social Media Comments Using Machine Learning and LIME

arXiv.org Artificial IntelligenceJun-13-2025

Research on understanding emotions in written language continues to expand, especially for understudied languages with distinctive regional expressions and cultural features, such as Bangla. This study examines emotion analysis using 22,698 social media comments from the EmoNoBa dataset. For language analysis, we employ machine learning models--Linear SVM, KNN, and Random Forest--with n-gram data from a TF-IDF vectorizer. We additionally investigated how PCA affects the reduction of dimensionality. Moreover, we utilized a BiLSTM model and AdaBoost to improve decision trees. To make our machine learning models easier to understand, we used LIME to explain the predictions of the AdaBoost classifier, which uses decision trees. With the goal of advancing sentiment analysis in languages with limited resources, our work examines various techniques to find efficient techniques for emotion identification in Bangla.

artificial intelligence, machine learning, natural language, (18 more...)

2506.10154

Country: Asia > Bangladesh (0.15)

Genre: Research Report > Experimental Study (0.34)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.89)
(2 more...)

arXiv.org Artificial IntelligenceJun-13-2025

On feature selection in double-imbalanced data settings: a Random Forest approach

Demaria, Fabio

Feature selection is a critical step in high-dimensional classification tasks, particularly under challenging conditions of double imbalance, namely settings characterized by both class imbalance in the response variable and dimensional asymmetry in the data ( n p). In such scenarios, traditional feature selection methods applied to Random Forests (RF) often yield unstable or misleading importance rankings. This paper proposes a novel thresholding scheme for feature selection based on minimal depth, which exploits the tree topology to assess variable relevance. Extensive experiments on simulated and real-world datasets demonstrate that the proposed approach produces more parsimonious and accurate subsets of variables compared to conventional minimal depth-based selection. The method provides a practical and interpretable solution for variable selection in RF under double imbalance conditions. Keywords: Class imbalance, Double-Imbalance settings, Feature selection, Random Forests. 1. Introduction Class imbalance is a prevalent issue in machine learning, occurring when one class is significantly underrepresented relative to others in the target variable.

artificial intelligence, imbalance, machine learning, (18 more...)

2506.10929

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Mayala, Moria, Scornet, Erwan, Tillier, Charles, Wintenberger, Olivier

Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification

arXiv.org Machine LearningJun-11-2025

Many classification tasks involve imbalanced data, in which a class is largely underrepresented. Several techniques consists in creating a rebalanced dataset on which a classifier is trained. In this paper, we study theoretically such a procedure, when the classifier is a Centered Random Forests (CRF). We establish a Central Limit Theorem (CLT) on the infinite CRF with explicit rates and exact constant. We then prove that the CRF trained on the rebalanced dataset exhibits a bias, which can be removed with appropriate techniques. Based on an importance sampling (IS) approach, the resulting debiased estimator, called IS-ICRF, satisfies a CLT centered at the prediction function value. For high imbalance settings, we prove that the IS-ICRF estimator enjoys a variance reduction compared to the ICRF trained on the original data. Therefore, our theoretical analysis highlights the benefits of training random forests on a rebalanced dataset (followed by a debiasing procedure) compared to using the original data. Our theoretical results, especially the variance rates and the variance reduction, appear to be valid for Breiman's random forests in our experiments.

artificial intelligence, machine learning, theorem 2, (17 more...)

2506.08548

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Liang, Zhongyuan, Rewolinski, Zachary T., Agarwal, Abhineet, Tang, Tiffany M., Yu, Bin

Local MDI+: Local Feature Importances for Tree-Based Models

arXiv.org Machine LearningJun-11-2025

Tree-based ensembles such as random forests remain the go-to for tabular data over deep learning models due to their prediction performance and computational efficiency. These advantages have led to their widespread deployment in high-stakes domains, where interpretability is essential for ensuring trustworthy predictions. This has motivated the development of popular local (i.e. sample-specific) feature importance (LFI) methods such as LIME and TreeSHAP. However, these approaches rely on approximations that ignore the model's internal structure and instead depend on potentially unstable perturbations. These issues are addressed in the global setting by MDI+, a feature importance method which exploits an equivalence between decision trees and linear models on a transformed node basis. However, the global MDI+ scores are not able to explain predictions when faced with heterogeneous individual characteristics. To address this gap, we propose Local MDI+ (LMDI+), a novel extension of the MDI+ framework to the sample specific setting. LMDI+ outperforms existing baselines LIME and TreeSHAP in identifying instance-specific signal features, averaging a 10% improvement in downstream task performance across twelve real-world benchmark datasets. It further demonstrates greater stability by consistently producing similar instance-level feature importance rankings across multiple random forest fits. Finally, LMDI+ enables local interpretability use cases, including the identification of closer counterfactuals and the discovery of homogeneous subgroups.

artificial intelligence, machine learning, signal feature, (20 more...)

2506.08928

Country: North America > United States > Florida > Miami-Dade County (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Nguyen, Anh V, Klabjan, Diego

FedGA-Tree: Federated Decision Tree using Genetic Algorithm

arXiv.org Artificial IntelligenceJun-11-2025

--In recent years, with rising concerns for data privacy, Federated Learning has gained prominence, as it enables collaborative training without the aggregation of raw data from participating clients. However, much of the current focus has been on parametric gradient-based models, while nonparametric counterparts such as decision tree are relatively understudied. Existing methods for adapting decision trees to Federated Learning generally combine a greedy tree-building algorithm with differential privacy to produce a global model for all clients. These methods are limited to classification trees and categorical data due to the constraints of differential privacy. In this paper, we explore an alternative approach that utilizes Genetic Algorithm to facilitate the construction of personalized decision trees and accommodate categorical and numerical data, thus allowing for both classification and regression trees. Comprehensive experiments demonstrate that our method surpasses decision trees trained solely on local data and a benchmark algorithm. With rapid advancement of AI and machine learning, there are many concerns about data usage and privacy. Lawmakers worldwide have attempted to create incentives for companies to focus more on privacy in their model development, with key examples including the General Data Protection Regulations implemented by the European Union and the California Consumer Privacy Act.Federated Learning (FL) was introduced by Google as an approach for mobile devices to collaboratively solve a machine learning problem without sharing user's local data [14], [17]. In the FL framework, multiple clients contribute to solve a machine learning problem while maintaining their data locally. A global server helps aggregate information that clients deem fit to share, such as model weights, and construct an improved model. The two main scenarios of data distribution in FL are horizontal and vertical. In the former, clients have the same features but different set of samples while in the latter, clients have different features but the same set of samples. Currently, the main focus of the FL research community is on parametric, gradient-based models, yet there is an expanding body of literature that explores the use of decision tree models [25], [26] [7], [19].

artificial intelligence, decision tree learning, machine learning, (17 more...)

2506.08176

Country: North America > United States > California (0.24)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Focused Education > Special Education (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)