loocv
On Optimal Generalizability in Parametric Learning
We consider the parametric learning problem, where the objective of the learner is determined by a parametric loss function. Employing empirical risk minimization with possibly regularization, the inferred parameter vector will be biased toward the training samples. Such bias is measured by the cross validation procedure in practice where the data set is partitioned into a training set used for training and a validation set, which is not used in training and is left to measure the out-of-sample performance. A classical cross validation strategy is the leave-one-out cross validation (LOOCV) where one sample is left out for validation and training is done on the rest of the samples that are presented to the learner, and this process is repeated on all of the samples. LOOCV is rarely used in practice due to the high computational complexity. In this paper, we first develop a computationally efficient approximate LOOCV (ALOOCV) and provide theoretical guarantees for its performance. Then we use ALOOCV to provide an optimization algorithm for finding the regularizer in the empirical risk minimization framework. In our numerical experiments, we illustrate the accuracy and efficiency of ALOOCV as well as our proposed framework for the optimization of the regularizer.
- Research Report > New Finding (0.92)
- Research Report > Experimental Study (0.92)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- South America > Paraguay > Asunción > Asunción (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
On Optimal Generalizability in Parametric Learning
We consider the parametric learning problem, where the objective of the learner is determined by a parametric loss function. Employing empirical risk minimization with possibly regularization, the inferred parameter vector will be biased toward the training samples. Such bias is measured by the cross validation procedure in practice where the data set is partitioned into a training set used for training and a validation set, which is not used in training and is left to measure the out-of-sample performance. A classical cross validation strategy is the leave-one-out cross validation (LOOCV) where one sample is left out for validation and training is done on the rest of the samples that are presented to the learner, and this process is repeated on all of the samples. LOOCV is rarely used in practice due to the high computational complexity. In this paper, we first develop a computationally efficient approximate LOOCV (ALOOCV) and provide theoretical guarantees for its performance. Then we use ALOOCV to provide an optimization algorithm for finding the regularizer in the empirical risk minimization framework. In our numerical experiments, we illustrate the accuracy and efficiency of ALOOCV as well as our proposed framework for the optimization of the regularizer.
- Research Report > New Finding (0.92)
- Research Report > Experimental Study (0.92)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- South America > Paraguay > Asunción > Asunción (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets
Konrad, Phongsakon Mark, Popa, Andrei-Alexandru, Sabzehmeidani, Yaser, Zhong, Liang, Liehn, Elisa A., Ayvaz, Serkan
Accurate segmentation of carotid artery structures in histopathological images is vital for advancing cardiovascular disease research and diagnosis. However, deep learning model development in this domain is constrained by the scarcity of annotated cardiovascular histopathological data. This study investigates a systematic evaluation of state-of-the-art deep learning segmentation models, including convolutional neural networks (U-Net, DeepLabV3+), a Vision Transformer (SegFormer), and recent foundation models (SAM, MedSAM, MedSAM+UNet), on a limited dataset of cardiovascular histology images. Despite employing an extensive hyperparameter optimization strategy with Bayesian search, our findings reveal that model performance is highly sensitive to data splits, with minor differences driven more by statistical noise than by true algorithmic superiority. This instability exposes the limitations of standard benchmarking practices in low-data clinical settings and challenges the assumption that performance rankings reflect meaningful clinical utility.
- Asia > Singapore (0.04)
- Europe > Denmark > Southern Denmark (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > Iowa > Johnson County > Iowa City (0.14)
- North America > Canada > Quebec > Montreal (0.14)
The use of cross validation in the analysis of designed experiments
Weese, Maria L., Smucker, Byran J., Edwards, David J.
Cross-validation (CV) is a common method to tune machine learning methods and can be used for model selection in regression as well. Because of the structured nature of small, traditional experimental designs, the literature has warned against using CV in their analysis. The striking increase in the use of machine learning, and thus CV, in the analysis of experimental designs, has led us to empirically study the effectiveness of CV compared to other methods of selecting models in designed experiments, including the little bootstrap. We consider both response surface settings where prediction is of primary interest, as well as screening where factor selection is most important. Overall, we provide evidence that the use of leave-one-out cross-validation (LOOCV) in the analysis of small, structured is often useful. More general $k$-fold CV may also be competitive but its performance is uneven.
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- North America > United States > South Carolina > Charleston County > Charleston (0.04)
- North America > United States > Ohio > Butler County > Oxford (0.04)
- (7 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.68)
- Health & Medicine (1.00)
- Materials (0.93)
- Energy (0.67)
Multivariate Analysis of Gut Microbiota Composition and Prevalence of Gastric Cancer
Shankarnarayanan, Aadhith, Gangopadhyay, Dheeman, Alzaatreh, Ayman
The global surge in the cases of gastric cancer has prompted an investigation into the potential of gut microbiota as a predictive marker for the disease. The alterations in gut diversity are suspected to be associated with an elevated risk of gastric cancer. This paper delves into finding the correlation between gut microbiota and gastric cancer, focusing on patients who have undergone total and subtotal gastrectomy. Utilizing data mining and statistical learning methods, an analysis was conducted on 16S-RNA sequenced genes obtained from 96 participants with the aim of identifying specific genera of gut microbiota associated with gastric cancer. The study reveals several prominent bacterial genera that could potentially serve as biomarkers assessing the risk of gastric cancer. These findings offer a pathway for early risk assessment and precautionary measures in the diagnosis of gastric cancer. The intricate mechanisms through which these gut microbiotas influence gastric cancer progression warrant further investigation. This research significantly aims to contribute to the growing understanding of the gut-cancer axis and its implications in disease prediction and prevention.
- Asia > Middle East > UAE > Sharjah Emirate > Sharjah (0.06)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Health & Medicine > Therapeutic Area > Oncology > Gastric Cancer (1.00)
- Health & Medicine > Therapeutic Area > Gastroenterology (1.00)