AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.63)

@machinelearnbotFeb-27-2018, 12:45:28 GMT

Imbalance Class Classification using Random Forest

I agree with the idea of using boosting algorithms is better but not enough in practice. SMOTE would be a good starting point (definitely I would opt for a over-sampling strategy) but there are others. Here you can find a nice implementation of solutions for imbalanced data in python (scikit-learn-contrib). The success of any of these techniques depend largely on the nature of your data. Therefore, I would suggest you try different approaches and see how they affect your results.

artificial intelligence, decision tree learning, imbalance class classification, (2 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Probst, Philipp, Bischl, Bernd, Boulesteix, Anne-Laure

Tunability: Importance of Hyperparameters of Machine Learning Algorithms

arXiv.org Machine LearningFeb-26-2018

Modern machine learning algorithms for classification or regression such as gradient boosting, random forest and neural networks involve a number of parameters that have to be fixed before running them. Such parameters are commonly denoted as hyperparameters in machine learning, a terminology we also adopt here. The term tuning parameter is also frequently used to denote parameters that should be carefully tuned, i.e. optimized with respect to performance. The users of these algorithms can use defaults of these hyperparameters that are specified in the employed software package, set them to alternative specific values or use a tuning strategy to choose them appropriately for the specific dataset at hand. In this context, we define tunability as the amount of performance gain that can be achieved by setting the considered hyperparameter to the best possible value instead of the default value. The goal of this paper is two-fold. Firstly, we formalize the problem of tuning from a statistical point of view and suggest general measures quantifying the tunability of hyperparameters of algorithms. Secondly, we conduct a large-scale benchmarking study based on 38 datasets from the OpenML platform (Vanschoren et al., 2013) using six of the most common machine learning algorithms for classification and regression and apply our measures to assess the tunability of their parameters. The results yield interesting insights into the investigated hyperparameters that in some cases allow general conclusions on their tunability. Our results may help users of the algorithms to decide whether it is worth to conduct a possibly time consuming tuning strategy, to focus on the most important hyperparameters and to chose adequate hyperparameter spaces for tuning.

artificial intelligence, hyperparameter, machine learning, (19 more...)

1802.09596

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.71)

Nitanda, Atsushi, Suzuki, Taiji

Functional Gradient Boosting based on Residual Network Perception

arXiv.org Machine LearningFeb-25-2018

Residual Networks (ResNets) have become state-of-the-art models in deep learning and several theoretical studies have been devoted to understanding why ResNet works so well. One attractive viewpoint on ResNet is that it is optimizing the risk in a functional space by combining an ensemble of effective features. In this paper, we adopt this viewpoint to construct a new gradient boosting method, which is known to be very powerful in data analysis. To do so, we formalize the gradient boosting perspective of ResNet mathematically using the notion of functional gradients and propose a new method called ResFGB for classification tasks by leveraging ResNet perception. Two types of generalization guarantees are provided from the optimization perspective: one is the margin bound and the other is the expected risk bound by the sample-splitting technique. Experimental results show superior performance of the proposed method over state-of-the-art methods such as LightGBM.

artificial intelligence, deep learning, machine learning, (16 more...)

1802.09031

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

#artificialintelligenceFeb-24-2018, 05:36:59 GMT

The Random Forest Algorithm – Towards Data Science

Random Forest is a flexible, easy to use machine learning algorithm that produces, even without hyper-parameter tuning, a great result most of the time. It is also one of the most used algorithms, because it's simplicity and the fact that it can be used for both classification and regression tasks. In this post, you are going to learn, how the random forest algorithm works and several other important things about it. Random Forest is a supervised learning algorithm. Like you can already see from it's name, it creates a forest and makes it somehow random.

artificial intelligence, machine learning, random forest, (16 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Sabzevari, Maryam, Martínez-Muñoz, Gonzalo, Suárez, Alberto

Vote-boosting ensembles

arXiv.org Machine LearningFeb-21-2018

Vote-boosting is a sequential ensemble learning method in which the individual classifiers are built on different weighted versions of the training data. To build a new classifier, the weight of each training instance is determined in terms of the degree of disagreement among the current ensemble predictions for that instance. For low class-label noise levels, especially when simple base learners are used, emphasis should be made on instances for which the disagreement rate is high. When more flexible classifiers are used and as the noise level increases, the emphasis on these uncertain instances should be reduced. In fact, at sufficiently high levels of class-label noise, the focus should be on instances on which the ensemble classifiers agree. The optimal type of emphasis can be automatically determined using cross-validation. An extensive empirical analysis using the beta distribution as emphasis function illustrates that vote-boosting is an effective method to generate ensembles that are both accurate and robust.

artificial intelligence, classifier, inductive learning, (19 more...)

1606.09458

Country:

North America > United States (0.46)
Europe > Spain (0.14)
Europe > Germany (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (1.00)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (1.00)
Energy > Oil & Gas > Midstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.67)

#artificialintelligenceFeb-18-2018, 05:37:25 GMT

ŷhat Random Forests in Python

Random forest is a highly versatile machine learning method with numerous applications ranging from marketing to healthcare and insurance. It can be used to model the impact of marketing on customer acquisition, retention, and churn or to predict disease risk and susceptibility in patients. Random forest is capable of regression and classification. It can handle a large number of features, and it's helpful for estimating which of your variables are important in the underlying data being modeled. This is a post about random forests using Python.

artificial intelligence, machine learning, random forest, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Gieseke, Fabian, Igel, Christian

Training Big Random Forests with Little Resources

arXiv.org Machine LearningFeb-18-2018

Without access to large compute clusters, building random forests on large datasets is still a challenging problem. This is, in particular, the case if fully-grown trees are desired. We propose a simple yet effective framework that allows to efficiently construct ensembles of huge trees for hundreds of millions or even billions of training instances using a cheap desktop computer with commodity hardware. The basic idea is to consider a multi-level construction scheme, which builds top trees for small random subsets of the available data and which subsequently distributes all training instances to the top trees' leaves for further processing. While being conceptually simple, the overall efficiency crucially depends on the particular implementation of the different phases. The practical merits of our approach are demonstrated using dense datasets with hundreds of millions of training instances.

artificial intelligence, implementation, machine learning, (19 more...)

1802.06394

Country:

Europe > Denmark > Capital Region > Copenhagen (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.87)

#artificialintelligenceFeb-16-2018, 23:57:52 GMT

Random Forests explained intuitively

Say, you appeared for the position of Statistical analyst at WalmartLabs. Now like most of the companies, you don't just have one round of interview. You have multiple rounds of interviews. Each one of these interviews is chaired by independent panels. Generally, even the questions asked in these interviews differ from each other.

decision tree learning, interview, machine learning, (2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.47)

Lundberg, Scott M., Lee, Su-In

Consistent feature attribution for tree ensembles

arXiv.org Machine LearningFeb-16-2018

Note that a newer expanded version of this paper is now available at: arXiv:1802.03888 It is critical in many applications to understand what features are important for a model, and why individual predictions were made. For tree ensemble methods these questions are usually answered by attributing importance values to input features, either globally or for a single prediction. Here we show that current feature attribution methods are inconsistent, which means changing the model to rely more on a given feature can actually decrease the importance assigned to that feature. To address this problem we develop fast exact solutions for SHAP (SHapley Additive exPlanation) values, which were recently shown to be the unique additive feature attribution method based on conditional expectations that is both consistent and locally accurate. We integrate these improvements into the latest version of XGBoost, demonstrate the inconsistencies of current methods, and show how using SHAP values results in significantly improved supervised clustering performance. Feature importance values are a key part of understanding widely used models such as gradient boosting trees and random forests, so improvements to them have broad practical implications.

artificial intelligence, feature attribution, machine learning, (19 more...)

1706.0606

Country: North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)