Precision Cancer Classification and Biomarker Identification from mRNA Gene Expression via Dimensionality Reduction and Explainable AI
Tabassum, Farzana, Islam, Sabrina, Rizwan, Siana, Sobhan, Masrur, Ahmed, Tasnim, Ahmed, Sabbir, Chowdhury, Tareque Mohmud
–arXiv.org Artificial Intelligence
Gene expression analysis is a critical method for cancer classification, enabling precise diagnoses through the identification of unique molecular signatures associated with various tumors. Identifying cancer-specific genes from gene expression values enables a more tailored and personalized treatment approach. However, the high dimensionality of mRNA gene expression data poses challenges for analysis and data extraction. This research presents a comprehensive pipeline designed to accurately identify 33 distinct cancer types and their corresponding gene sets. It incorporates a combination of normalization and feature selection techniques to reduce dataset dimensionality effectively while ensuring high performance. Notably, our pipeline successfully identifies a substantial number of cancer-specific genes using a reduced feature set of just 500, in contrast to using the full dataset comprising 19,238 features. By employing an ensemble approach that combines three top-performing classifiers, a classification accuracy of 96.61% was achieved. Furthermore, we leverage Explainable AI to elucidate the biological significance of the identified cancer-specific genes, employing Differential Gene Expression (DGE) analysis.
arXiv.org Artificial Intelligence
Oct-8-2024
- Country:
- Asia
- Armenia (0.04)
- Bangladesh > Dhaka Division
- Dhaka District > Dhaka (0.05)
- Europe > Netherlands
- North Holland > Amsterdam (0.04)
- North America
- Canada > Ontario (0.04)
- United States > Florida
- Miami-Dade County > Miami (0.04)
- Asia
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Technology:
- Information Technology
- Artificial Intelligence > Machine Learning
- Neural Networks > Deep Learning (1.00)
- Performance Analysis > Accuracy (0.88)
- Statistical Learning (1.00)
- Data Science > Data Mining (1.00)
- Artificial Intelligence > Machine Learning
- Information Technology