On feature selection in double-imbalanced data settings: a Random Forest approach

Jun-13-2025–arXiv.org Artificial Intelligence

Feature selection is a critical step in high-dimensional classification tasks, particularly under challenging conditions of double imbalance, namely settings characterized by both class imbalance in the response variable and dimensional asymmetry in the data ( n p). In such scenarios, traditional feature selection methods applied to Random Forests (RF) often yield unstable or misleading importance rankings. This paper proposes a novel thresholding scheme for feature selection based on minimal depth, which exploits the tree topology to assess variable relevance. Extensive experiments on simulated and real-world datasets demonstrate that the proposed approach produces more parsimonious and accurate subsets of variables compared to conventional minimal depth-based selection. The method provides a practical and interpretable solution for variable selection in RF under double imbalance conditions. Keywords: Class imbalance, Double-Imbalance settings, Feature selection, Random Forests. 1. Introduction Class imbalance is a prevalent issue in machine learning, occurring when one class is significantly underrepresented relative to others in the target variable.

artificial intelligence, imbalance, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Jun-13-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Decision Tree Learning (1.00)
  - Ensemble Learning (0.93)
  - Performance Analysis > Accuracy (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found