Robustness of Threshold-Based Feature Rankers with Data Sampling on Noisy and Imbalanced Data

Shanab, Ahmad Abu (Florida Atlantic University) | Khoshgoftaar, Taghi M. (Florida Atlantic University) | Wald, Randall (Florida Atlantic University)

May-20-2012–AAAI Conferences

Gene selection has become a vital component in the learning process when using high-dimensional gene expression data. Although extensive research has been done towards evaluating the performance of classifiers trained with the selected features, the stability of feature ranking techniques has received relatively little study. This work evaluates the robustness of eleven threshold-based feature selection techniques, examining the impact of data sampling and class noise on the stability of feature selection. To assess the robustness of feature selection techniques, we use four groups of gene expression datasets, employ eleven threshold-based feature rankers, and generate artificial class noise to better simulate real-world datasets. The results demonstrate that although no ranker consistently outperforms the others, MI and Dev show the best stability on average, while GI and PR show the least stability on average. Results also show that trying to balance datasets through data sampling has on average no positive impact on the stability of feature ranking techniques applied to those datasets. In addition, increased feature subset sizes improve stability, but only does so reliably for noisy datasets.

dataset, noise, stability, (13 more...)

AAAI Conferences

May-20-2012

Conferences PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - New Jersey > Hudson County
    - Secaucus (0.04)
  - Florida > Palm Beach County
    - Boca Raton (0.04)
  - California > Orange County
    - Anaheim (0.04)
- Europe > Ireland
  - Leinster > County Dublin > Dublin (0.04)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area > Oncology (0.94)

Technology:
- Information Technology
  - Data Science (1.00)
  - Artificial Intelligence > Machine Learning
    - Performance Analysis > Accuracy (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found