Improving statistical learning methods via features selection without replacement sampling and random projection
khan, Sulaiman, Ahmad, Muhammad, Ullah, Fida, Ibañez, Carlos Aguilar, Rodriguez, José Eduardo Valdez
Cancer is fundamentally a genetic disease characterized by genetic and epigenetic alterations that disrupt normal gene expression, leading to uncontrolled cell growth and metastasis. High-dimensional microarray datasets pose challenges for classification models due to the "small n, large p" problem, resulting in overfitting. This study makes three different key contributions: 1) we propose a machine learning-based approach integrating the Feature Selection Without Re-placement (FSWOR) technique and a projection method to improve classification accuracy. 2) We apply the Kendall statistical test to identify the most significant genes from the brain cancer mi-croarray dataset (GSE50161), reducing the feature space from 54,675 to 20,890 genes.3) we apply machine learning models using k-fold cross validation techniques in which our model incorpo-rates ensemble classifiers with LDA projection and Naïve Bayes, achieving a test score of 96%, outperforming existing methods by 9.09%. The results demonstrate the effectiveness of our ap-proach in high-dimensional gene expression analysis, improving classification accuracy while mitigating overfitting. This study contributes to cancer biomarker discovery, offering a robust computational method for analyzing microarray data.
Jun-3-2025
- Country:
- Asia > China (0.04)
- Europe > Czechia
- Prague (0.04)
- North America
- Central America (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- South America (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine
- Pharmaceuticals & Biotechnology (1.00)
- Therapeutic Area
- Neurology (1.00)
- Oncology > Brain Cancer (1.00)
- Health & Medicine
- Technology: