AITopics | cross-validation strategy

Collaborating Authors

cross-validation strategy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The CAST package for training and assessment of spatial prediction models in R

Meyer, Hanna, Ludwig, Marvin, Milà, Carles, Linnenbrink, Jan, Schumacher, Fabian

arXiv.org Machine LearningApr-10-2024

One key task in environmental science is to map environmental variables continuously in space or even in space and time. Machine learning algorithms are frequently used to learn from local field observations to make spatial predictions by estimating the value of the variable of interest in places where it has not been measured. However, the application of machine learning strategies for spatial mapping involves additional challenges compared to "non-spatial" prediction tasks that often originate from spatial autocorrelation and from training data that are not independent and identically distributed. In the past few years, we developed a number of methods to support the application of machine learning for spatial data which involves the development of suitable cross-validation strategies for performance assessment and model selection, spatial feature selection, and methods to assess the area of applicability of the trained models. The intention of the CAST package is to support the application of machine learning strategies for predictive mapping by implementing such methods and making them available for easy integration into modelling workflows. Here we introduce the CAST package and its core functionalities. At the case study of mapping plant species richness, we will go through the different steps of the modelling workflow and show how CAST can be used to support more reliable spatial predictions.

cross-validation strategy, prediction, training data, (14 more...)

arXiv.org Machine Learning

2404.06978

Country:

Europe > Germany > North Rhine-Westphalia > Münster Region > Münster (0.05)
North America > United States > New York (0.04)
South America > Chile (0.04)
(3 more...)

Genre:

Workflow (0.75)
Research Report (0.50)

Industry: Education > Assessment & Standards (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.45)

Add feedback

Streamlined Framework for Agile Forecasting Model Development towards Efficient Inventory Management

Soeseno, Jonathan Hans, González, Sergio, Chen, Trista Pei-Chun

arXiv.org Artificial IntelligenceApr-13-2023

This paper proposes a framework for developing forecasting models by streamlining the connections between core components of the developmental process. The proposed framework enables swift and robust integration of new datasets, experimentation on different algorithms, and selection of the best models. We start with the datasets of different issues and apply pre-processing steps to clean and engineer meaningful representations of time-series data. To identify robust training configurations, we introduce a novel mechanism of multiple cross-validation strategies. We apply different evaluation metrics to find the best-suited models for varying applications. One of the referent applications is our participation in the intelligent forecasting competition held by the United States Agency of International Development (USAID). Finally, we leverage the flexibility of the framework by applying different evaluation metrics to assess the performance of the models in inventory management settings.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2304.06344

Country:

North America > United States (1.00)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
(2 more...)

Add feedback

Machine Learning to Predict the Antimicrobial Activity of Cold Atmospheric Plasma-Activated Liquids

Ozdemir, Mehmet Akif, Ozdemir, Gizem Dilara, Gul, Merve, Guren, Onan, Ercan, Utku Kursat

arXiv.org Artificial IntelligenceJul-25-2022

Plasma is defined as the fourth state of matter and non-thermal plasma can be produced at atmospheric pressure under a high electrical field. The strong and broad-spectrum antimicrobial effect of plasma-activated liquids (PALs) is now well known. The proven applicability of machine learning (ML) in the medical field is encouraging for its application in the field of plasma medicine as well. Thus, ML applications on PALs could present a new perspective to better understand the influences of various parameters on their antimicrobial effects. In this paper, comparative supervised ML models are presented by using previously obtained data to qualitatively predict the in vitro antimicrobial activity of PALs. A literature search was performed and data is collected from 33 relevant articles. After the required preprocessing steps, two supervised ML methods, namely classification, and regression are applied to data to obtain microbial inactivation (MI) predictions. For classification, MI is labeled in four categories and for regression, MI is used as a continuous variable. Two different robust cross-validation strategies are conducted for classification and regression models to evaluate the proposed method; repeated stratified k-fold cross-validation and k-fold cross-validation, respectively. We also investigate the effect of different features on models. The results demonstrated that the hyperparameter-optimized Random Forest Classifier (oRFC) and Random Forest Regressor (oRFR) provided better results than other models for the classification and regression, respectively. Finally, the best test accuracy of 82.68% for oRFC and R2 of 0.75 for the oRFR are obtained. ML techniques could contribute to a better understanding of plasma parameters that have a dominant role in the desired antimicrobial effect. Furthermore, such findings may contribute to the definition of a plasma dose in the future.

artificial intelligence, machine learning, regression model, (18 more...)

arXiv.org Artificial Intelligence

2207.12478

Country:

Asia > Middle East > Republic of Türkiye > İzmir Province > İzmir (0.04)
Europe > Poland (0.04)
Europe > Sweden > Skåne County > Malmö (0.04)
Europe > Portugal > Porto > Porto (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

The Fed - Machine Learning, the Treasury Yield Curve and Recession Forecasting

#artificialintelligenceMay-20-2020, 15:28:56 GMT

We use machine learning methods to examine the power of Treasury term spreads and other financial market and macroeconomic variables to forecast US recessions, vis-à-vis probit regression. In particular we propose a novel strategy for conducting cross-validation on classifiers trained with macro/financial panel data of low frequency and compare the results to those obtained from standard k-folds cross-validation. Consistent with the existing literature we find that, in the time series setting, forecast accuracy estimates derived from k-folds are biased optimistically, and cross-validation strategies which eliminate data "peeking" produce lower, and perhaps more realistic, estimates of forecast accuracy. That is, while a k-folds cross-validation indicates tha t the forecast accuracy of tree methods dominates that of neural networks, which in turn dominates that of probit regression, the more conservative cross-validation strategy we propose indicates the exact opposite, and that probit regression should be preferred over machine learning methods, at least in the context of the present problem. This latter result stands in contrast to a growing body of literature demonstrating that machine learning methods outperform many alternative classification algorithms and we discuss some possible reasons for our result.

artificial intelligence, curve and recession forecasting, machine learning, (7 more...)

#artificialintelligence

Country: North America > United States (0.40)

Industry:

Banking & Finance > Economy (1.00)
Government > Regional Government > North America Government > United States Government (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.38)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

Predicting into unknown space? Estimating the area of applicability of spatial prediction models

Meyer, Hanna, Pebesma, Edzer

arXiv.org Machine LearningMay-16-2020

Predictive modelling using machine learning has become very popular for spatial mapping of the environment. Models are often applied to make predictions far beyond sampling locations where new geographic locations might considerably differ from the training data in their environmental properties. However, areas in the predictor space without support of training data are problematic. Since the model has no knowledge about these environments, predictions have to be considered uncertain. Estimating the area to which a prediction model can be reliably applied is required. Here, we suggest a methodology that delineates the "area of applicability" (AOA) that we define as the area, for which the cross-validation error of the model applies. We first propose a "dissimilarity index" (DI) that is based on the minimum distance to the training data in the predictor space, with predictors being weighted by their respective importance in the model. The AOA is then derived by applying a threshold based on the DI of the training data where the DI is calculated with respect to the cross-validation strategy used for model training. We test for the ideal threshold by using simulated data and compare the prediction error within the AOA with the cross-validation error of the model. We illustrate the approach using a simulated case study. Our simulation study suggests a threshold on DI to define the AOA at the .95 quantile of the DI in the training data. Using this threshold, the prediction error within the AOA is comparable to the cross-validation RMSE of the model, while the cross-validation error does not apply outside the AOA. This applies to models being trained with randomly distributed training data, as well as when training data are clustered in space and where spatial cross-validation is applied. We suggest to report the AOA alongside predictions, complementary to validation measures.

artificial intelligence, machine learning, training data, (18 more...)

arXiv.org Machine Learning

2005.07939

Country:

North America (0.14)
Europe > Austria > Vienna (0.14)
Europe > Germany > North Rhine-Westphalia > Münster Region > Münster (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.34)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

MR Imaging–Based Radiomic Signatures of Distinct Molecular Subgroups of Medulloblastoma

#artificialintelligenceJan-24-2019, 16:07:51 GMT

BACKGROUND AND PURPOSE: Distinct molecular subgroups of pediatric medulloblastoma confer important differences in prognosis and therapy. Currently, tissue sampling is the only method to obtain information for classification. Our goal was to develop and validate radiomic and machine learning approaches for predicting molecular subgroups of pediatric medulloblastoma. MATERIALS AND METHODS: In this multi-institutional retrospective study, we evaluated MR imaging datasets of 109 pediatric patients with medulloblastoma from 3 children's hospitals from January 2001 to January 2014. A computational framework was developed to extract MR imaging–based radiomic features from tumor segmentations, and we tested 2 predictive models: a double 10-fold cross-validation using a combined dataset consisting of all 3 patient cohorts and a 3-dataset cross-validation, in which training was performed on 2 cohorts and testing was performed on the third independent cohort. We used the Wilcoxon rank sum test for feature selection with assessment of area under the receiver operating characteristic curve to evaluate model performance.

artificial intelligence, machine learning, subgroup, (17 more...)

#artificialintelligence

Country: North America > United States > California (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Brain Cancer (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)

Add feedback

How Feature Engineering Can Help You Do Well in a Kaggle Competition – Part 2

@machinelearnbotJul-1-2017, 23:40:11 GMT

That post described some preliminary and important data science tasks like exploratory data analysis and feature engineering performed for the competition, using a Spark cluster deployed on Google Dataproc. It was necessary to separate from clicks_train.csv Machine learning models were trained using train set data and their accuracy was evaluated on validation set data, by comparing the predictions with the ground truth labels (clicks). As we optimize CV model accuracy -- by testing different feature engineering approaches, algorithms and hyperparameters tuning -- we expect to improve our score on the competition Leaderboard (LB) accordingly (test set). The categorical fields whose average CTR presented higher predictive accuracy on CV score were ad_document_id, ad_source_id, ad_publisher_id, ad_advertiser_id, ad_campain_id, document attributes (category_ids, topics_ids, entities_ids) and their combinations with event_country_id, which modeled regional user preferences.

competition, INFORMATION TECHNOLOGY SERVICES, machine learning, (17 more...)

@machinelearnbot

Genre: Contests & Prizes (0.52)

Industry: Information Technology > Services (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Cross-validation failure: small sample sizes lead to large error bars

Varoquaux, Gaël

arXiv.org Machine LearningJun-23-2017

Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers. The principled approach to establish their validity and usefulness is cross-validation, testing prediction on unseen data. Here, I would like to raise awareness on error bars of cross-validation, which are often underestimated. Simple experiments show that sample sizes of many neuroimaging studies inherently lead to large error bars, eg $\pm$10% for 100 samples. The standard error across folds strongly underestimates them. These large error bars compromise the reliability of conclusions drawn with predictive models, such as biomarkers or methods developments where, unlike with cognitive neuroimaging MVPA approaches, more samples cannot be acquired by repeating the experiment across many subjects. Solutions to increase sample size must be investigated, tackling possible increases in heterogeneity of the data.

accuracy, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1706.07581

Country:

North America > United States (0.14)
Europe > Portugal > Braga > Braga (0.04)
Europe > United Kingdom (0.04)
Europe > France > Île-de-France (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.88)

Add feedback

Assessing and tuning brain decoders: cross-validation, caveats, and guidelines

Varoquaux, Gaël, Raamana, Pradeep Reddy, Engemann, Denis, Hoyos-Idrobo, Andrés, Schwartz, Yannick, Thirion, Bertrand

arXiv.org Machine LearningNov-7-2016

Decoding, ie prediction from brain images or signals, calls for empirical evaluation of its predictive power. Such evaluation is achieved via cross-validation, a method also used to tune decoders' hyper-parameters. This paper is a review on cross-validation procedures for decoding in neuroimaging. It includes a didactic overview of the relevant theoretical considerations. Practical aspects are highlighted with an extensive empirical study of the common decoders in within-and across-subject predictions, on multiple datasets --anatomical and functional MRI and MEG-- and simulations. Theory and experiments outline that the popular " leave-one-out " strategy leads to unstable and biased estimates, and a repeated random splits method should be preferred. Experiments outline the large error bars of cross-validation in neuroimaging settings: typical confidence intervals of 10%. Nested cross-validation can tune decoders' parameters while avoiding circularity bias. However we find that it can be more favorable to use sane defaults, in particular for non-sparse decoders.

accuracy, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1016/j.neuroimage.2016.10.038

1606.05201

Country:

North America > Canada (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Add feedback