Ensemble Learning
A Hybrid Random Forest and CNN Framework for Tile-Wise Oil-Water Classification in Hyperspectral Images
Nickzamir, Mehdi, Gandab, Seyed Mohammad Sheikh Ahamdi
A novel hybrid Random Forest and Convolutional Neural Network (CNN) framework is presented for oil-water classification in hyperspectral images (HSI). To address the challenge of preserving spatial context, the images were divided into smaller, non-overlapping tiles, which served as the basis for training, validation, and testing. Random Forest demonstrated strong performance in pixel-wise classification, outperforming models such as XGBoost, Attention-Based U-Net, and HybridSN. However, Random Forest loses spatial context, limiting its ability to fully exploit the spatial relationships in hyperspectral data. To improve performance, a CNN was trained on the probability maps generated by the Random Forest, leveraging the CNN's capacity to incorporate spatial context. The hybrid approach achieved 7.6% improvement in recall (to 0.85), 2.4% improvement in F1 score (to 0.84), and 0.54% improvement in AUC (to 0.99) compared to the baseline. These results highlight the effectiveness of combining probabilistic outputs with spatial feature learning for context-aware analysis of hyperspectral images.
Evaluating Spoken Language as a Biomarker for Automated Screening of Cognitive Impairment
Lima, Maria R., Capstick, Alexander, Geranmayeh, Fatemeh, Nilforooshan, Ramin, Matariฤ, Maja, Vaidyanathan, Ravi, Barnaghi, Payam
Timely and accurate assessment of cognitive impairment is a major unmet need in populations at risk. Alterations in speech and language can be early predictors of Alzheimer's disease and related dementias (ADRD) before clinical signs of neurodegeneration. Voice biomarkers offer a scalable and non-invasive solution for automated screening. However, the clinical applicability of machine learning (ML) remains limited by challenges in generalisability, interpretability, and access to patient data to train clinically applicable predictive models. Using DementiaBank recordings (N=291, 64% female), we evaluated ML techniques for ADRD screening and severity prediction from spoken language. We validated model generalisability with pilot data collected in-residence from older adults (N=22, 59% female). Risk stratification and linguistic feature importance analysis enhanced the interpretability and clinical utility of predictions. For ADRD classification, a Random Forest applied to lexical features achieved a mean sensitivity of 69.4% (95% confidence interval (CI) = 66.4-72.5) and specificity of 83.3% (78.0-88.7). On real-world pilot data, this model achieved a mean sensitivity of 70.0% (58.0-82.0) and specificity of 52.5% (39.3-65.7). For severity prediction using Mini-Mental State Examination (MMSE) scores, a Random Forest Regressor achieved a mean absolute MMSE error of 3.7 (3.7-3.8), with comparable performance of 3.3 (3.1-3.5) on pilot data. Linguistic features associated with higher ADRD risk included increased use of pronouns and adverbs, greater disfluency, reduced analytical thinking, lower lexical diversity and fewer words reflecting a psychological state of completion. Our interpretable predictive modelling offers a novel approach for in-home integration with conversational AI to monitor cognitive health and triage higher-risk individuals, enabling earlier detection and intervention.
Random Forest Calibration
Shaker, Mohammad Hossein, Hรผllermeier, Eyke
The Random Forest (RF) classifier is often claimed to be relatively well calibrated when compared with other machine learning methods. Moreover, the existing literature suggests that traditional calibration methods, such as isotonic regression, do not substantially enhance the calibration of RF probability estimates unless supplied with extensive calibration data sets, which can represent a significant obstacle in cases of limited data availability. Nevertheless, there seems to be no comprehensive study validating such claims and systematically comparing state-of-the-art calibration methods specifically for RF. To close this gap, we investigate a broad spectrum of calibration methods tailored to or at least applicable to RF, ranging from scaling techniques to more advanced algorithms. Our results based on synthetic as well as real-world data unravel the intricacies of RF probability estimates, scrutinize the impacts of hyper-parameters, compare calibration methods in a systematic way. We show that a well-optimized RF performs as well as or better than leading calibration approaches.
An Explainable Disease Surveillance System for Early Prediction of Multiple Chronic Diseases
Khan, Shaheer Ahmad, Shahid, Muhammad Usamah, Abdullah, Ahmad, Hashmat, Ibrahim, Farooq, Muddassar
This study addresses a critical gap in the healthcare system by developing a clinically meaningful, practical, and explainable disease surveillance system for multiple chronic diseases, utilizing routine EHR data from multiple U.S. practices integrated with CureMD's EMR/EHR system. Unlike traditional systems--using AI models that rely on features from patients' labs--our approach focuses on routinely available data, such as medical history, vitals, diagnoses, and medications, to preemptively assess the risks of chronic diseases in the next year. We trained three distinct models for each chronic disease: prediction models that forecast the risk of a disease 3, 6, and 12 months before a potential diagnosis. We developed Random Forest models, which were internally validated using F1 scores and AUROC as performance metrics and further evaluated by a panel of expert physicians for clinical relevance based on inferences grounded in medical knowledge. Additionally, we discuss our implementation of integrating these models into a practical EMR system. Beyond using Shapley attributes and surrogate models for explainability, we also introduce a new rule-engineering framework to enhance the intrinsic explainability of Random Forests.
Review for NeurIPS paper: Joints in Random Forests
While the approach is presented as a general generative model based on DT and RF, the paper fails to show its practical interest beyond handling missing values at test time. The possibility of using the approach for outlier detection is potentially interesting but the experiment in the paper is restricted to a single dataset and does not include any comparison with competitors except Gaussian KDE. Overall, the properties of GeDT and GeRF as general purpose density estimators are not really studied. My feeling is that because the tree partitioning is unchanged with respect to standard discriminative DT and RF, GeDT and GeRF are probably only appropriate in the context of tasks related to target predictions. In other tasks, I don't see why they would perform better than pure PC models or other methods mentioned in the related work section.
Review for NeurIPS paper: Towards Convergence Rate Analysis of Random Forests for Classification
Weaknesses: - The studied algorithms remain quite far from real random forests (no bootstrap sampling, split choices are fully independent of the data, trees are pruned, etc.) - As in other results in the literature, convergence rates for forests are by-product of convergence rate of individual trees (using Lemma 1). The results therefore do not really show the benefit of using forests instead of trees in terms of convergence rate. This should be discussed in the paper I think. No real conclusion is drawn from the theoretical results that would help better understand standard RF or suggest modification to these methods. I think this kind of very technical contribution would be more appropriate for a journal submission than for a conference (given the limited time allotted for reviewing).
Review for NeurIPS paper: Towards Convergence Rate Analysis of Random Forests for Classification
The paper provides finite-sample convergence rates for two simplified variants of random forests. Overall, the contribution is purely theoretical. I personally think that this work shed new interesting ideas on the behavior of a learning algorithm that is intensively used world wide. This work clearly deserve a poster acceptation at NeurIPS.
ILETIA: An AI-enhanced method for individualized trigger-oocyte pickup interval estimation of progestin-primed ovarian stimulation protocol
Wu, Binjian, Li, Qian, Kuang, Zhe, Gao, Hongyuan, Liu, Xinyi, Guo, Haiyan, Chen, Qiuju, Liu, Xinyi, Jiang, Yangruizhe, Zhang, Yuqi, Zha, Jinyin, Li, Mingyu, Ren, Qiuhan, Feng, Sishuo, Zhang, Haicang, Lu, Xuefeng, Zhang, Jian
In vitro fertilization-embryo transfer (IVF-ET) stands as one of the most prevalent treatments for infertility. During an IVF-ET cycle, the time interval between trigger shot and oocyte pickup (OPU) is a pivotal period for follicular maturation, which determines mature oocytes yields and impacts the success of subsequent procedures. However, accurately predicting this interval is severely hindered by the variability of clinicians'experience that often leads to suboptimal oocyte retrieval rate. To address this challenge, we propose ILETIA, the first machine learning-based method that could predict the optimal trigger-OPU interval for patients receiving progestin-primed ovarian stimulation (PPOS) protocol. Specifically, ILETIA leverages a Transformer to learn representations from clinical tabular data, and then employs gradient-boosted trees for interval prediction. For model training and evaluating, we compiled a dataset PPOS-DS of nearly ten thousand patients receiving PPOS protocol, the largest such dataset to our knowledge. Experimental results demonstrate that our method achieves strong performance (AUROC = 0.889), outperforming both clinicians and other widely used computational models. Moreover, ILETIA also supports premature ovulation risk prediction in a specific OPU time (AUROC = 0.838). Collectively, by enabling more precise and individualized decisions, ILETIA has the potential to improve clinical outcomes and lay the foundation for future IVF-ET research.
Reviews: A Debiased MDI Feature Importance Measure for Random Forests
I am updating my score from 7 to 8. --- # Originality The main contributions are all original. While the take-home message of the study is in retrospect simple and obvious ( compute MDI importances on out-of-bag samples), the paper provides an original analysis that explains and justifies this modification of the computation of MDI importances. Some remarks however: - I would have appreciated a controlled experiment where G0(T) can be computed exactly in order to empirically appreciate the (supposed) tightness of the bound. More specifically, what if A1 and A2 are not satisfied? In real-word setups, A1 is very unlikely to hold.
Reviews: A Debiased MDI Feature Importance Measure for Random Forests
The paper studies theoretically the bias of the popular MDI importance measures in the presence of noisy features and proposes a very simple practical solution to reduce it. Two reviewers are very enthusiastic about the paper, even more so after reading the authors' response. One reviewer has several valid concerns about missing links between theory and practice but still recommends acceptance. I therefore recommend accepting the paper. The author are asked to take into account the reviewers comments when preparing the final version of their paper and, in particular, to address the specific request of reviewer 2 (to clarify how MDI-oob is computed).