Goto

Collaborating Authors

 cross-validation method


The role of data partitioning on the performance of EEG-based deep learning models in supervised cross-subject analysis: a preliminary study

Del Pup, Federico, Zanola, Andrea, Tshimanga, Louis Fabrice, Bertoldo, Alessandra, Finos, Livio, Atzori, Manfredo

arXiv.org Artificial Intelligence

Deep learning is significantly advancing the analysis of electroencephalography (EEG) data by effectively discovering highly nonlinear patterns within the signals. Data partitioning and cross-validation are crucial for assessing model performance and ensuring study comparability, as they can produce varied results and data leakage due to specific signal properties (e.g., biometric). Such variability leads to incomparable studies and, increasingly, overestimated performance claims, which are detrimental to the field. Nevertheless, no comprehensive guidelines for proper data partitioning and cross-validation exist in the domain, nor is there a quantitative evaluation of their impact on model accuracy, reliability, and generalizability. To assist researchers in identifying optimal experimental strategies, this paper thoroughly investigates the role of data partitioning and cross-validation in evaluating EEG deep learning models. Five cross-validation settings are compared across three supervised cross-subject classification tasks (BCI, Parkinson's, and Alzheimer's disease detection) and four established architectures of increasing complexity (ShallowConvNet, EEGNet, DeepConvNet, and Temporal-based ResNet). The comparison of over 100,000 trained models underscores, first, the importance of using subject-based cross-validation strategies for evaluating EEG deep learning models, except when within-subject analyses are acceptable (e.g., BCI). Second, it highlights the greater reliability of nested approaches (N-LNSO) compared to non-nested counterparts, which are prone to data leakage and favor larger models overfitting to validation data. In conclusion, this work provides EEG deep learning researchers with an analysis of data partitioning and cross-validation and offers guidelines to avoid data leakage, currently undermining the domain with potentially overestimated performance claims.


Empirical investigation of multi-source cross-validation in clinical machine learning

Leinonen, Tuija, Wong, David, Wahab, Ali, Nadarajah, Ramesh, Kaisti, Matti, Airola, Antti

arXiv.org Machine Learning

Traditionally, machine learning-based clinical prediction models have been trained and evaluated on patient data from a single source, such as a hospital. Cross-validation methods can be used to estimate the accuracy of such models on new patients originating from the same source, by repeated random splitting of the data. However, such estimates tend to be highly overoptimistic when compared to accuracy obtained from deploying models to sources not represented in the dataset, such as a new hospital. The increasing availability of multi-source medical datasets provides new opportunities for obtaining more comprehensive and realistic evaluations of expected accuracy through source-level cross-validation designs. In this study, we present a systematic empirical evaluation of standard K-fold cross-validation and leave-source-out cross-validation methods in a multi-source setting. We consider the task of electrocardiogram based cardiovascular disease classification, combining and harmonizing the openly available PhysioNet CinC Challenge 2021 and the Shandong Provincial Hospital datasets for our study. Our results show that K-fold cross-validation, both on single-source and multi-source data, systemically overestimates prediction performance when the end goal is to generalize to new sources. Leave-source-out cross-validation provides more reliable performance estimates, having close to zero bias though larger variability. The evaluation highlights the dangers of obtaining misleading cross-validation results on medical data and demonstrates how these issues can be mitigated when having access to multi-source data.


Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting

Ghasemzadeh, Hamzeh, Hillman, Robert E., Mehta, Daryush D.

arXiv.org Artificial Intelligence

This study's first purpose is to provide quantitative evidence that would incentivize researchers to instead use the more robust method of nested cross-validation. The second purpose is to present methods and MATLAB codes for doing power analysis for ML-based analysis during the design of a study. Monte Carlo simulations were used to quantify the interactions between the employed cross-validation method, the discriminative power of features, the dimensionality of the feature space, and the dimensionality of the model. Four different cross-validations (single holdout, 10-fold, train-validation-test, and nested 10-fold) were compared based on the statistical power and statistical confidence of the ML models. Distributions of the null and alternative hypotheses were used to determine the minimum required sample size for obtaining a statistically significant outcome ({\alpha}=0.05, 1-\b{eta}=0.8). Statistical confidence of the model was defined as the probability of correct features being selected and hence being included in the final model. Our analysis showed that the model generated based on the single holdout method had very low statistical power and statistical confidence and that it significantly overestimated the accuracy. Conversely, the nested 10-fold cross-validation resulted in the highest statistical confidence and the highest statistical power, while providing an unbiased estimate of the accuracy. The required sample size with a single holdout could be 50% higher than what would be needed if nested cross-validation were used. Confidence in the model based on nested cross-validation was as much as four times higher than the confidence in the single holdout-based model. A computational model, MATLAB codes, and lookup tables are provided to assist researchers with estimating the sample size during the design of their future studies.


Top 10 AI Articles for April 2022

#artificialintelligence

The OpenVINO toolkit offers tools and libraries that optimize neural networks by applying various techniques like pruning, quantization, and speed up inference in a hardware-agnostic approach to Intel architectures. Intel released the toolkit's most significant update since its launch, which includes more deep-learning models, device portability, and higher inferencing performance with fewer code changes. We are grateful for your time and hope you enjoyed reading the AI newsletter. If you enjoy the newsletter, please consider subscribing if you haven't yet, or share it with your friends and colleagues -- it is genuinely appreciated. Thank you for joining us!


Combinatorial PurgedKFold Cross-Validation for Deep Reinforcement Learning

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. This article is written by Berend Gort & Bruce Yang, core team members of the Open-Source project AI4Finance. This project is an open-source community sharing AI tools for finance, and a part of the Columbia University in New York. Our previous article described the Combinatorial PurgedKFold Cross-Validation method in detail for classifiers (or regressors) with regular predictions.


Use integrated explainability tools and improve model quality using Amazon SageMaker Autopilot

#artificialintelligence

A few minutes later, the kernel should be started and ready to go. The following screenshot shows our results. Depending on your preference, you can either create an Autopilot job through the Studio user interface without writing a single line of code, or use the SageMaker SDK in a SageMaker notebook. The following notebook uses the SageMaker SDK to create an Autopilot job. For simplicity, we explore the no code approach using the Studio console to demonstrate these new features.

  Country: North America > United States > California > Orange County > Irvine (0.04)
  Genre: Research Report (0.35)
  Industry: Retail > Online (0.40)

Anticipatory Detection of Compulsive Body-focused Repetitive Behaviors with Wearables

Searle, Benjamin Lucas, Spathis, Dimitris, Constantinides, Marios, Quercia, Daniele, Mascolo, Cecilia

arXiv.org Artificial Intelligence

Body-focused repetitive behaviors (BFRBs), like face-touching or skin-picking, are hand-driven behaviors which can damage one's appearance, if not identified early and treated. Technology for automatic detection is still under-explored, with few previous works being limited to wearables with single modalities (e.g., motion). Here, we propose a multi-sensory approach combining motion, orientation, and heart rate sensors to detect BFRBs. We conducted a feasibility study in which participants (N=10) were exposed to BFRBs-inducing tasks, and analyzed 380 mins of signals under an extensive evaluation of sensing modalities, cross-validation methods, and observation windows. Our models achieved an AUC > 0.90 in distinguishing BFRBs, which were more evident in observation windows 5 mins prior to the behavior as opposed to 1-min ones. In a follow-up qualitative survey, we found that not only the timing of detection matters but also models need to be context-aware, when designing just-in-time interventions to prevent BFRBs.


Exploring Text-transformers in AAAI 2021 Shared Task: COVID-19 Fake News Detection in English

Li, Xiangyang, Xia, Yu, Long, Xiang, Li, Zheng, Li, Sujian

arXiv.org Artificial Intelligence

In this paper, we describe our system for the AAAI 2021 shared task of COVID-19 Fake News Detection in English, where we achieved the 3rd position with the weighted F1 score of 0.9859 on the test set. Specifically, we proposed an ensemble method of different pre-trained language models such as BERT, Roberta, Ernie, etc. with various training strategies including warm-up,learning rate schedule and k-fold cross-validation. We also conduct an extensive analysis of the samples that are not correctly classified. The code is available at:https://github.com/archersama/3rd-solution-COVID19-Fake-News-Detection-in-English.


Causal Interaction Trees: Tree-Based Subgroup Identification for Observational Data

Yang, Jiabei, Dahabreh, Issa J., Steingrimsson, Jon A.

arXiv.org Machine Learning

We propose Causal Interaction Trees for identifying subgroups of participants that have enhanced treatment effects using observational data. We extend the Classification and Regression Tree algorithm by using splitting criteria that focus on maximizing between-group treatment effect heterogeneity based on subgroup-specific treatment effect estimators to dictate decision-making in the algorithm. We derive properties of three subgroup-specific treatment effect estimators that account for the observational nature of the data -- inverse probability weighting, g-formula and doubly robust estimators. We study the performance of the proposed algorithms using simulations and implement the algorithms in an observational study that evaluates the effectiveness of right heart catheterization on critically ill patients.


Tournament Leave-pair-out Cross-validation for Receiver Operating Characteristic (ROC) Analysis

Perez, Ileana Montoya, Airola, Antti, Boström, Peter J., Jambor, Ivan, Pahikkala, Tapio

arXiv.org Machine Learning

Receiver operating characteristic (ROC) analysis is widely used for evaluating diagnostic systems. Recent studies have shown that estimating an area under ROC curve (AUC) with standard cross-validation methods suffers from a large bias. The leave-pair-out (LPO) cross-validation has been shown to correct this bias. However, while LPO produces an almost unbiased estimate of AUC, it does not provide a ranking of the data needed for plotting and analyzing the ROC curve. In this study, we propose a new method called tournament leave-pair-out (TLPO) cross-validation. This method extends LPO by creating a tournament from pair comparisons to produce a ranking for the data. TLPO preserves the advantage of LPO for estimating AUC, while it also allows performing ROC analysis. We have shown using both synthetic and real world data that TLPO is as reliable as LPO for AUC estimation and confirmed the bias in leave-one-out cross-validation on low-dimensional data.