AITopics

2112.0322

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.82)

Lee, Jonghyeon, De Brouwer, Edward, Hamzi, Boumediene, Owhadi, Houman

Learning dynamical systems from data: A simple cross-validation perspective, part III: Irregularly-Sampled Time Series

arXiv.org Machine LearningNov-25-2021

A simple and interpretable way to learn a dynamical system from data is to interpolate its vector-field with a kernel. In particular, this strategy is highly efficient (both in terms of accuracy and complexity) when the kernel is data-adapted using Kernel Flows (KF)~\cite{Owhadi19} (which uses gradient-based optimization to learn a kernel based on the premise that a kernel is good if there is no significant loss in accuracy if half of the data is used for interpolation). Despite its previous successes, this strategy (based on interpolating the vector field driving the dynamical system) breaks down when the observed time series is not regularly sampled in time. In this work, we propose to address this problem by directly approximating the vector field of the dynamical system by incorporating time differences between observations in the (KF) data-adapted kernels. We compare our approach with the classical one over different benchmark dynamical systems and show that it significantly improves the forecasting accuracy while remaining simple, fast, and robust.

dynamical system, kernel, time sery, (13 more...)

2111.13037

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Scientific Computing (1.00)
Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

#artificialintelligenceNov-23-2021, 08:36:50 GMT

Top 7 cross validation techniques with Python Code - Analytics Vidhya

Not suitable for Time Series data: For Time Series data the order of the samples matter. But in Stratified Cross-Validation, samples are selected in random order. LeavePOut cross-validation is an exhaustive cross-validation technique, in which p-samples are used as the validation set and remaining n-p samples are used as the training set. Suppose we have 100 samples in the dataset. If we use p 10 then in each iteration 10 values will be used as a validation set and the remaining 90 samples as the training set. This process is repeated till the whole dataset gets divided on the validation set of p-samples and n-p training samples. All the data samples get used as both training and validation samples.

dataset, training and validation, validation, (14 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

#artificialintelligenceOct-7-2021, 11:55:12 GMT

Cross Validation

"Cross Validation" Anyone who has just started dabbling in the world of Data Science must have seen this term. In this article cross validation, the need for cross validation, and some types of cross validation will be explained in very simple words. Let's just start with a basic question before going into details. Well, it helps us to compare different machine learning algorithms. To evaluate how well a machine learning algorithm is working we need to train the algorithm on some part of data (known as "Training Data") and to test on unseen data (known as "Testing Data").

cross validation, test data, validation, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

arXiv.org Machine LearningSep-14-2021

Targeted Cross-Validation

Zhang, Jiawei, Ding, Jie, Yang, Yuhong

In many applications, we have access to the complete dataset but are only interested in the prediction of a particular region of predictor variables. A standard approach is to find the globally best modeling method from a set of candidate methods. However, it is perhaps rare in reality that one candidate method is uniformly better than the others. A natural approach for this scenario is to apply a weighted $L_2$ loss in performance assessment to reflect the region-specific interest. We propose a targeted cross-validation (TCV) to select models or procedures based on a general weighted $L_2$ loss. We show that the TCV is consistent in selecting the best performing candidate under the weighted $L_2$ loss. Experimental studies are used to demonstrate the use of TCV and its potential advantage over the global CV or the approach of using only local data for modeling a local region. Previous investigations on CV have relied on the condition that when the sample size is large enough, the ranking of two candidates stays the same. However, in many applications with the setup of changing data-generating processes or highly adaptive modeling methods, the relative performance of the methods is not static as the sample size varies. Even with a fixed data-generating process, it is possible that the ranking of two methods switches infinitely many times. In this work, we broaden the concept of the selection consistency by allowing the best candidate to switch as the sample size varies, and then establish the consistency of the TCV. This flexible framework can be applied to high-dimensional and complex machine learning scenarios where the relative performances of modeling procedures are dynamic.

candidate method, procedure, tcv, (17 more...)

2109.06949

Country: North America > United States > Minnesota (0.04)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.66)

Industry: Banking & Finance > Real Estate (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

Adler, Afek Ilay, Painsky, Amichai

Feature Importance in Gradient Boosting Trees with Cross-Validation Feature Selection

arXiv.org Machine LearningSep-12-2021

Gradient Boosting Machines (GBM) are among the go-to algorithms on tabular data, which produce state of the art results in many prediction tasks. Despite its popularity, the GBM framework suffers from a fundamental flaw in its base learners. Specifically, most implementations utilize decision trees that are typically biased towards categorical variables with large cardinalities. The effect of this bias was extensively studied over the years, mostly in terms of predictive performance. In this work, we extend the scope and study the effect of biased base learners on GBM feature importance (FI) measures. We show that although these implementation demonstrate highly competitive predictive performance, they still, surprisingly, suffer from bias in FI. By utilizing cross-validated (CV) unbiased base learners, we fix this flaw at a relatively low computational cost. We demonstrate the suggested framework in a variety of synthetic and real-world setups, showing a significant improvement in all GBM FI measures while maintaining relatively the same level of prediction accuracy.

categorical feature, fi measure, implementation, (14 more...)

2109.05468

Country:

Oceania > New Zealand > North Island > Waikato > Hamilton (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.41)

#artificialintelligenceAug-30-2021, 19:02:17 GMT

Cross-Validation Techniques

Time Series Cross-Validation Method 14. Blocked Cross-Validation Method 15.

artificial intelligence, cross-validation technique, machine learning, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.98)

#artificialintelligenceJul-23-2021, 17:41:45 GMT

Full cross-validation and generating learning curves for time-series models - KDnuggets

Time series analysis is needed almost in any quantitative field and real-life systems that collect data over time, i.e., temporal datasets. Building predictive models on temporal datasets for the future evolution of systems in consideration are usually called forecasting. The validation of such models deviates from the standard holdout method of having random disjoint splits of train, test, and validation sets used in supervised learning. This stems from the fact that time series are ordered, and order induces all sorts of statistical properties that should be retained. For this reason, applying direct cross-validation to time-series model building is not possible and only restricted to out-of-sample (OOS) validation, using the end-tail of a temporal set as a single test set.

full cross-validation and generating, time sery, time-series model, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.70)

Stephenson, William T., Frangella, Zachary, Udell, Madeleine, Broderick, Tamara

Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression

arXiv.org Machine LearningJul-19-2021

Models like LASSO and ridge regression are extensively used in practice due to their interpretability, ease of use, and strong theoretical guarantees. Cross-validation (CV) is widely used for hyperparameter tuning in these models, but do practical optimization methods minimize the true out-of-sample loss? A recent line of research promises to show that the optimum of the CV loss matches the optimum of the out-of-sample loss (possibly after simple corrections). It remains to show how tractable it is to minimize the CV loss. In the present paper, we show that, in the case of ridge regression, the CV loss may fail to be quasiconvex and thus may have multiple local optima. We can guarantee that the CV loss is quasiconvex in at least one case: when the spectrum of the covariate matrix is nearly flat and the noise in the observed responses is not too high. More generally, we show that quasiconvexity status is independent of many properties of the observed data (response norm, covariate-matrix right singular vectors and singular-value scaling) and has a complex dependence on the few that remain. We empirically confirm our theory using simulated experiments.

assumption 3, matrix, quasiconvexity, (14 more...)

2107.09194

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.61)

#artificialintelligenceJun-11-2021, 13:31:00 GMT

Leave-One-Out Cross-Validation

It's one of the technique in which we implement KFold cross-validation, where k is equal to n i.e the number of observations in the data. Thus, every single point will be used in a validation set, we will create n models, for n-observations in the data. Each point/sample is used once as a test set while the remaining data/samples form the training set. The scikit-learn Python machine learning library provides an implementation of the LOOCV via the LeaveOneOut class using Leave-One-Out cross-validator.

dataset, implementation, leave-one-out cross-validation

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.74)