AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

A guide for using the Wavelet Transform in Machine Learning

#artificialintelligenceOct-14-2019, 07:24:27 GMT

In a previous blog-post we have seen how we can use Signal Processing techniques for the classification of time-series and signals. A very short summary of that post is: We can use the Fourier Transform to transform a signal from its time-domain to its frequency domain. The peaks in the frequency spectrum indicate the most occurring frequencies in the signal. The larger and sharper a peak is, the more prevalent a frequency is in a signal. The location (frequency-value) and height (amplitude) of the peaks in the frequency spectrum then can be used as input for Classifiers like Random Forest or Gradient Boosting.

fourier transform, frequency, wavelet transform, (6 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.58)

Add feedback

A note on the consistency of the random forest algorithm

Ferreira, José A.

arXiv.org Machine LearningOct-14-2019

Nowadays, the algorithm is acknowledged to be easy to use and to perform very well in general, even in problems involving many predictor variables (see for instance Biau and Scornet (2016) or the introduction to Scornet, Biau and Vert (2015)) ― so well, indeed, that several authors have posed and studied the question of their consistency (see Scornet, Biau and Vert (2015) and the earlier references provided by them). Consistent nonparametric statistical predictors have been known for a long time (e.g. Nadaraya (1964), Watson (1964), Stone (1977), Devroye and Wagner (1980)), but they converge very slowly and their computer implementations tend to be slow, especially when they involve many variables. In view of their comparative accuracy and high speed of implementation, random forests would become even more attractive if they were shown to be consistent under general data ‐ generating mechanisms. Besides, consistency is almost indispensable in applications of statistical prediction to the estimation of'causal effects' based on observational data (e.g.

artificial intelligence, machine learning, nullnull null, (17 more...)

arXiv.org Machine Learning

1910.00943

Country:

North America > United States > New York (0.04)
Europe > Netherlands (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Predicting movie revenue with AdaBoost, XGBoost and LightGBM

#artificialintelligenceOct-10-2019, 00:38:09 GMT

Marvel's Avengers: Endgame recently dethroned Avatar as the highest grossing movie in history and while there was no doubt about this movie becoming very successful, I want to understand what makes any given movie a success. I am using data from The Movie Database provided through kaggle. The data set is split into a train and test set with the train set containing 3,000 movies and the test set comprising 4,398. There are 22 features in both the train and test set, including budget, genres, belongs_to_collection, runtime, keywords and more. The train data set also contains the target variable revenue.

algorithm, movie, revenue, (14 more...)

#artificialintelligence

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)

Add feedback

NGBoost: Natural Gradient Boosting for Probabilistic Prediction

Duan, Tony, Avati, Anand, Ding, Daisy Yi, Basu, Sanjay, Ng, Andrew Y., Schuler, Alejandro

arXiv.org Machine LearningOct-9-2019

We present Natural Gradient Boosting (NGBoost), an algorithm which brings probabilistic prediction capability to gradient boosting in a generic way. Predictive uncertainty estimation is crucial in many applications such as healthcare and weather forecasting. Probabilistic prediction, which is the approach where the model outputs a full probability distribution over the entire outcome space, is a natural way to quantify those uncertainties. Gradient Boosting Machines have been widely successful in prediction tasks on structured input data, but a simple boosting solution for probabilistic prediction of real valued outputs is yet to be made. NGBoost is a gradient boosting approach which uses the \emph{Natural Gradient} to address technical challenges that makes generic probabilistic prediction hard with existing gradient boosting methods. Our approach is modular with respect to the choice of base learner, probability distribution, and scoring rule. We show empirically on several regression datasets that NGBoost provides competitive predictive performance of both uncertainty estimates and traditional metrics.

gradient, natural gradient, prediction, (14 more...)

arXiv.org Machine Learning

1910.03225

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

How to train Boosted Trees models in TensorFlow

#artificialintelligenceOct-8-2019, 22:54:44 GMT

Tree ensemble methods such as gradient boosted decision trees and random forests are among the most popular and effective machine learning tools available when working with structured data. Tree ensemble methods are fast to train, work well without a lot of tuning, and do not require large datasets to train on. In TensorFlow, gradient boosted trees are available using the tf.estimator API, which also supports deep neural networks, wide-and-deep models, and more. For boosted trees, regression with pre-defined mean squared error loss (BoostedTreesRegressor) and classification with cross entropy loss (BoostedTreesClassifier) are supported.

feature importance, interpretability, prediction, (16 more...)

#artificialintelligence

Country: Europe (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.78)

Add feedback

microsoft/LightGBM

#artificialintelligenceOct-8-2019, 05:33:25 GMT

LightGBM is a gradient boosting framework that uses tree based learning algorithms. For further details, please refer to Features. Benefitting from these advantages, LightGBM is being widely-used in many winning solutions of machine learning competitions. Comparison experiments on public datasets show that LightGBM can outperform existing boosting frameworks on both efficiency and accuracy, with significantly lower memory consumption. What's more, parallel experiments show that LightGBM can achieve a linear speed-up by using multiple machines for training in specific settings.

github, lightgbm, microsoft lightgbm, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.39)

Add feedback

Random forest model identifies serve strength as a key predictor of tennis match outcome

Gao, Zijian, Kowalczyk, Amanda

arXiv.org Machine LearningOct-8-2019

Tennis is a popular sport worldwide, boasting millions of fans and numerous national and international tournaments. Like many sports, tennis has benefitted from the popularity of rigorous record-keeping of game and player information, as well as the growth of machine learning methods for use in sports analytics. Of particular interest to bettors and betting companies alike is potential use of sports records to predict tennis match outcomes prior to match start. We compiled, cleaned, and used the largest database of tennis match information to date to predict match outcome using fairly simple machine learning methods. Using such methods allows for rapid fit and prediction times to readily incorporate new data and make real-time predictions. We were able to predict match outcomes with upwards of 80% accuracy, much greater than predictions using betting odds alone, and identify serve strength as a key predictor of match outcome. By combining prediction accuracies from three models, we were able to nearly recreate a probability distribution based on average betting odds from betting companies, which indicates that betting companies are using similar information to assign odds to matches. These results demonstrate the capability of relatively simple machine learning models to quite accurately predict tennis match outcomes.

accuracy, match outcome, serve strength, (10 more...)

arXiv.org Machine Learning

1910.03203

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Georgia > Floyd County > Rome (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Sports > Tennis (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.43)

Add feedback

A Guide to XGBoost in Python - A site aimed at building a Data Science, Artificial Intelligence and Machine Learning empire.

#artificialintelligenceOct-7-2019, 12:34:58 GMT

In this article, we will take a look at the various aspects of the XGBoost library. XGBoost is one of the most reliable machine learning libraries when dealing with huge datasets. In my previous article, I gave a brief introduction about XGBoost on how to use it. This article will mainly aim towards exploring many of the useful features of XGBoost. When using machine learning libraries, it is not only about building state-of-the-art models.

dataset, dmatrix, xgboost, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

NFL Bet Predictor: Random Forest (Machine Learning Model) Week 5 Picks

#artificialintelligenceOct-7-2019, 03:58:15 GMT

Our Random Forest model predicts a 66% probability of the OVER 41 points hitting with odds from Westgate in this matchup. The expected value is 30 with a 103 Diff. Check out all the betting info for the Jacksonville Jaguars vs Carolina Panthers on our matchup page. Our Random Forest model predicts a 79% probability of the Indianapolis Colts keeping it within the 5.5 points being offered at the Westgate. The expected value is 50 with a 303 Diff.

machine learning model, probability, random forest model, (5 more...)

#artificialintelligence

Country:

North America > United States > Indiana > Marion County > Indianapolis (0.28)
North America > United States > New York (0.08)
North America > United States > Minnesota (0.08)
North America > United States > Illinois > Cook County > Chicago (0.08)

Industry: Leisure & Entertainment > Sports > Football (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

The Complete Guide to Decision Trees

#artificialintelligenceOct-5-2019, 12:57:00 GMT

Bagging (or Bootstrap Aggregation) is used when the goal is to reduce the variance of a DT. Variance relates to the fact that DTs can be quite unstable because small variations in the data might result in a completely different Tree being generated. So, the idea of Bagging is to solve this issue by creating in parallel random subsets of data (from the training data), where any observation has the same probability to appear in a new subset data. Next, each collection of subset data is used to train DTs, resulting in an ensemble of different DTs. Finally, an average of all predictions of those different DTs is used, which produces a more robust performance than single DTs.

artificial intelligence, complete guide, machine learning, (6 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.40)

Add feedback