AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Calibrated Boosting-Forest

Wu, Haozhen

arXiv.org Machine LearningNov-13-2017

Excellent ranking power along with well calibrated probability estimates are needed in many classification tasks. In this paper, we introduce a technique, Calibrated Boosting-Forest that captures both. This novel technique is an ensemble of gradient boosting machines that can support both continuous and binary labels. While offering superior ranking power over any individual regression or classification model, Calibrated Boosting-Forest is able to preserve well calibrated posterior probabilities. Along with these benefits, we provide an alternative to the tedious step of tuning gradient boosting machines. We demonstrate that tuning Calibrated Boosting-Forest can be reduced to a simple hyper-parameter selection. We further establish that increasing this hyper-parameter improves the ranking performance under a diminishing return. We examine the effectiveness of Calibrated Boosting-Forest on ligand-based virtual screening where both continuous and binary labels are available and compare the performance of Calibrated Boosting-Forest with logistic regression, gradient boosting machine and deep learning. Calibrated Boosting-Forest achieved an approximately 48% improvement compared to a state-of-art deep learning model. Moreover, it achieved around 95% improvement on probability quality measurement compared to the best individual gradient boosting machine. Calibrated Boosting-Forest offers a benchmark demonstration that in the field of ligand-based virtual screening, deep learning is not the universally dominant machine learning model and good calibrated probabilities can better facilitate virtual screening process.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

1710.05476

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > Experimental Study (0.34)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Introduction To Random Forest - Simplified Business Case Study

#artificialintelligenceNov-7-2017, 15:40:22 GMT

With increase in computational power, we can now choose algorithms which perform very intensive calculations. One such algorithm is "Random Forest", which we will discuss in this article. While the algorithm is very popular in various competitions (e.g. Before going any further, here is an example on the importance of choosing the best algorithm. Yesterday, I saw a movie called " Edge of tomorrow".

artificial intelligence, machine learning, prediction, (17 more...)

#artificialintelligence

Country: North America > Mexico (0.07)

Industry: Media > Film (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)

Add feedback

What is the difference between Bagging and Boosting?

@machinelearnbotNov-6-2017, 22:13:07 GMT

Bagging and Boosting are similar in that they are both ensemble techniques, where a set of weak learners are combined to create a strong learner that obtains better performance than a single one. So, let's start from the beginning: Ensemble is a Machine Learning concept in which the idea is to train multiple models using the same learning algorithm. The ensembles take part in a bigger group of methods, called multiclassifiers, where a set of hundreds or thousands of learners with a common objective are fused together to solve the problem. The second group of multiclassifiers contain the hybrid methods. They use a set of learners too, but they can be trained using different learning techniques.

artificial intelligence, learner, machine learning, (18 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.76)

Add feedback

Compact Multi-Class Boosted Trees

Ponomareva, Natalia, Colthurst, Thomas, Hendry, Gilbert, Haykal, Salem, Radpour, Soroush

arXiv.org Machine LearningOct-31-2017

Gradient boosted decision trees are a popular machine learning technique, in part because of their ability to give good accuracy with small models. We describe two extensions to the standard tree boosting algorithm designed to increase this advantage. The first improvement extends the boosting formalism from scalar-valued trees to vector-valued trees. This allows individual trees to be used as multiclass classifiers, rather than requiring one tree per class, and drastically reduces the model size required for multiclass problems. We also show that some other popular vector-valued gradient boosted trees modifications fit into this formulation and can be easily obtained in our implementation. The second extension, layer-by-layer boosting, takes smaller steps in function space, which is empirically shown to lead to a faster convergence and to a more compact ensemble. We have added both improvements to the open-source TensorFlow Boosted trees (TFBT) package, and we demonstrate their efficacy on a variety of multiclass datasets. We expect these extensions will be of particular interest to boosted tree applications that require small models, such as embedded devices, applications requiring fast inference, or applications desiring more interpretable models.

artificial intelligence, gradient, machine learning, (18 more...)

arXiv.org Machine Learning

1710.11547

Country: North America > United States > Maryland > Baltimore (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

TF Boosted Trees: A scalable TensorFlow based framework for gradient boosting

Ponomareva, Natalia, Radpour, Soroush, Hendry, Gilbert, Haykal, Salem, Colthurst, Thomas, Mitrichev, Petr, Grushetsky, Alexander

arXiv.org Machine LearningOct-31-2017

TF Boosted Trees (TFBT) is a new open-sourced frame-work for the distributed training of gradient boosted trees. It is based on TensorFlow, and its distinguishing features include a novel architecture, automatic loss differentiation, layer-by-layer boosting that results in smaller ensembles and faster prediction, principled multi-class handling, and a number of regularization techniques to prevent overfitting.

artificial intelligence, gradient, machine learning, (16 more...)

arXiv.org Machine Learning

1710.11555

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Denoising random forests

Hibino, Masaya, Kimura, Akisato, Yamashita, Takayoshi, Yamauchi, Yuji, Fujiyoshi, Hironobu

arXiv.org Machine LearningOct-30-2017

This paper proposes a novel type of random forests called a denoising random forests that are robust against noises contained in test samples. Such noise-corrupted samples cause serious damage to the estimation performances of random forests, since unexpected child nodes are often selected and the leaf nodes that the input sample reaches are sometimes far from those for a clean sample. Our main idea for tackling this problem originates from a binary indicator vector that encodes a traversal path of a sample in the forest. Our proposed method effectively employs this vector by introducing denoising autoencoders into random forests. A denoising autoencoder can be trained with indicator vectors produced from clean and noisy input samples, and non-leaf nodes where incorrect decisions are made can be identified by comparing the input and output of the trained denoising autoencoder. Multiple traversal paths with respect to the nodes with incorrect decisions caused by the noises can then be considered for the estimation.

artificial intelligence, machine learning, traversal path, (17 more...)

arXiv.org Machine Learning

1710.11004

Country:

North America > United States (0.15)
Asia > Japan > Honshū > Chūbu (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

XGBoost: A Concise Technical Overview

#artificialintelligenceOct-27-2017, 22:15:35 GMT

"Our single XGBoost model can get to the top three! Our final model just averaged XGBoost models with different random seeds." With entire blogs dedicated to how the sole application of XGBoost can propel one's ranking on Kaggle competitions, it is time we delved deeper into the concepts of XGBoost. Bagging algorithms control for high variance in a model. However, boosting algorithms are considered more effective as they deal with both bias as well as variance (the bias-variance trade-off).

artificial intelligence, machine learning, xgboost, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Tree Boosting With XGBoost – Why Does XGBoost Win "Every" Machine Learning Competition?

#artificialintelligenceOct-23-2017, 06:30:23 GMT

Tree boosting has empirically proven to be efficient for predictive mining for both classification and regression. For many years, MART (multiple additive regression trees) has been the tree boosting method of choice. But a starting from 2015, a first to try, always winning algorithm surged to the surface: XGBoost. This algorithm re-implements the tree boosting and gained popularity by winning Kaggle and other data science competition. The paper introduce in first place the supervised learning task and discuss the model selection techniques.

algorithm, artificial intelligence, machine learning, (16 more...)

#artificialintelligence

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Contests & Prizes (0.34)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

Simultaneous Matrix Diagonalization for Structural Brain Networks Classification

Mokrov, Nikita, Panov, Maxim, Gutman, Boris A., Faskowitz, Joshua I., Jahanshad, Neda, Thompson, Paul M.

arXiv.org Machine LearningOct-14-2017

This paper considers the problem of brain disease classification based on connectome data. A connectome is a network representation of a human brain. The typical connectome classification problem is very challenging because of the small sample size and high dimensionality of the data. We propose to use simultaneous approximate diagonalization of adjacency matrices in order to compute their eigenstructures in more stable way. The obtained approximate eigenvalues are further used as features for classification. The proposed approach is demonstrated to be efficient for detection of Alzheimer's disease, outperforming simple baselines and competing with state-of-the-art approaches to brain disease classification.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

1710.05213

Country: