Robust Semi-Supervised CT Radiomics for Lung Cancer Prognosis: Cost-Effective Learning with Limited Labels and SHAP Interpretation
Salmanpour, Mohammad R., Pouria, Amir Hossein, Falahati, Sonia, Taeb, Shahram, Mehrnia, Somayeh Sadat, Maghsudi, Mehdi, Jouzdani, Ali Fathi, Oveisi, Mehrdad, Hacihaliloglu, Ilker, Rahmim, Arman
–arXiv.org Artificial Intelligence
Background: CT imaging is vital for lung cancer management, offering detailed visualization for AI-based prognosis. However, supervised learning SL models require large labeled datasets, limiting their real-world application in settings with scarce annotations. Methods: We analyzed CT scans from 977 patients across 12 datasets extracting 1218 radiomics features using Laplacian of Gaussian and wavelet filters via PyRadiomics Dimensionality reduction was applied with 56 feature selection and extraction algorithms and 27 classifiers were benchmarked A semi supervised learning SSL framework with pseudo labeling utilized 478 unlabeled and 499 labeled cases Model sensitivity was tested in three scenarios varying labeled data in SL increasing unlabeled data in SSL and scaling both from 10 percent to 100 percent SHAP analysis was used to interpret predictions Cross validation and external testing in two cohorts were performed. Results: SSL outperformed SL, improving overall survival prediction by up to 17 percent. The top SSL model, Random Forest plus XGBoost classifier, achieved 0.90 accuracy in cross-validation and 0.88 externally. SHAP analysis revealed enhanced feature discriminability in both SSL and SL, especially for Class 1 survival greater than 4 years. SSL showed strong performance with only 10 percent labeled data, with more stable results compared to SL and lower variance across external testing, highlighting SSL's robustness and cost effectiveness. Conclusion: We introduced a cost-effective, stable, and interpretable SSL framework for CT-based survival prediction in lung cancer, improving performance, generalizability, and clinical readiness by integrating SHAP explainability and leveraging unlabeled data.
arXiv.org Artificial Intelligence
Jul-16-2025
- Country:
- Asia > Middle East
- Iran
- Gilan Province > Rasht (0.04)
- Hamadan Province > Hamadan (0.04)
- Tehran Province > Tehran (0.04)
- Iran
- North America > Canada
- Asia > Middle East
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Health & Medicine
- Diagnostic Medicine > Imaging (1.00)
- Therapeutic Area
- Oncology > Lung Cancer (1.00)
- Pulmonary/Respiratory Diseases (1.00)
- Health & Medicine
- Technology: