Goto

Collaborating Authors

 Decision Tree Learning


The Machine Learning Abstracts (Part 2): Decision Trees

#artificialintelligence

The basic intuition behind a decision tree is to map out all possible decision paths in the form of a tree. Let us try to analyze and deeply understand decision trees using our Spring cleaning example we talked about in the previous part about classification. Let's see how we go about the process of constructing a decision tree. Decision tree construction involves splitting. In our example, each feature has only two possible values ("yes" and "no").



The Machine Learning Abstracts: Decision Trees

@machinelearnbot

Decision Tree Learning is a classic algorithm used in machine learning for classification and regression purposes. The basic intuition behind a decision tree is to map out all possible decision paths in the form of a tree. A tree showing survival of passengers on the Titanic ("sibsp" is the number of spouses or siblings aboard). Let us try to analyze and deeply understand decision trees using our Spring cleaning example we talked about in the previous part about classification. Let's see how we go about the process of constructing a decision tree.


Latent tree models

arXiv.org Machine Learning

Latent tree models are graphical models defined on trees, in which only a subset of variables is observed. They were first discussed by Judea Pearl as tree-decomposable distributions to generalise star-decomposable distributions such as the latent class model. Latent tree models, or their submodels, are widely used in: phylogenetic analysis, network tomography, computer vision, causal modeling, and data clustering. They also contain other well-known classes of models like hidden Markov models, Brownian motion tree model, the Ising model on a tree, and many popular models used in phylogenetics. We offer here a concise introduction to the theory of latent tree models. We emphasise the role of tree metrics in the structural description of this model class, in designing learning algorithms, and in understanding fundamental limits of what and when can be learned. We present Gaussian and general Markov models as subclasses of latent tree models that admits tractable and rigorous analysis. A leaf of T is a vertex of degree one, an internal vertex is a vertex which is not a leaf, and an inner edge is an edge whose both ends are internal vertices. Given a treeT define a rooted tree as a directed graph obtained from T by picking one of its verticesr and directing all edges away fromr . The vertexr is called the root. Trees will be always leaf-labeled with the labelling set{ 1,...,m}, where m is the number of leaves. An undirected tree is trivalent if each internal vertex has degree precisely three. A rooted tree is a binary rooted tree if each internal vertex has precisely two children. In many applications rooted trees are depicted without using arrows, where direction is made implicit by drawing the root on the top and the leaves on the bottom; see Figure 1(c). Two special types of undirected trees are: a star tree with one internal vertex and a trivalent tree on four leaves called a quartet tree; see Figure 1(a) and (b). A forest is a collection of trees. Forests here are also leaf-labeled with the labelling set is{ 1,...,m}, which means that each tree in this collection is leaf-labeled and the corresponding collection of labelling sets forms a set partition of { 1,...,m}. We define three graph operations on trees (forests). Removing an edge means removing that edge from the edge set. Contracting an edge u v means removingu,v from the vertex set, adding a new vertexw and edges such thatw is adjacent to all vertices which were adjacent tou or v. Suppressing a vertex of degree two means removing that vertex and replacing the two edges incident to that vertex by a single edge. 1 2 3 4 5 1 2 3 4 (a) (b) (c) Figure 1: (a) An undirected star tree with five leaves, (b) a quartet tree, (c) a binary rooted tree.


How machine learning and financial technology are transforming the lending sector

#artificialintelligence

The lending ecosystem around the world has been at the centre of significant changes in the last decade. From financial technology disrupting the financial services sector industry with highly efficient and cost-effective processes, to stringent regulations following the 2008 global financial crisis, the growing technological intervention has played a significant role in the rapid evolution of the lending industry. One such technology is machine learning which has begun to create new and highly promising avenues in the lending market. Machine learning is a Predictive Model Algorithm that develops Artificial Intelligence around large sets of data through different predictive statistical techniques (such as Logistic Regression, Random Forest, Decision Tree etc.) and imparts decisions/insights based on the data it processes. Machines can be taught to identify any form of data which is stored electronically such as texts, images, speech, etc. and analyse by the machine through such algorithms to identify behaviours, patterns etc. and generate similar predictions when imposed on a new dataset. Fintech companies are increasingly augmenting the applications of machine learning algorithms in their operations to build efficient and effective systems.


Dealing with Unbalanced Classes, SVMs, Random Forests, and Decision Trees in Python

@machinelearnbot

So far I have talked about decision trees and ensembles. But I hope, I have made you understand the logic behind these concepts without getting too much into the mathematical details. In this post lets get into action, I will be implementing the concepts that we learned in these two blog posts. The only concept that I haven't discussed about is SVM. I suggest you to watch Professor Andrew Ng's week 7 videos on Coursera.


Discover structure behind data with decision trees - Vooban

#artificialintelligence

Let's understand and model the hidden structure behind data with Decision Trees. In this tutorial, we'll explore and inspect how a model can do its decisions on a car evaluation data set. Decision trees work with simple "if" clauses dichotomically chained together, splitting the data flow recursively on those "if"s until they reach a leaf where we can categorize the data. Such data inspection could be used to reverse engineer the behavior of any function. Since decision trees are good algorithms for discovering the structure hidden behind data, we'll use and model the car evaluation data set, for which the prediction problem is a (deterministic) surjective function.


A Practical Guide to Tree Based Learning Algorithms

#artificialintelligence

Tree based learning algorithms are quite common in data science competitions. These algorithms empower predictive models with high accuracy, stability and ease of interpretation. Unlike linear models, they map non-linear relationships quite well. Common examples of tree based models are: decision trees, random forest, and boosted trees. In this post, we will look at the mathematical details (along with various python examples) of decision trees, its advantages and drawbacks. We will find that they are simple and very useful for interpretation. However, they typically are not competitive with the best supervised learning approaches.


Cost-complexity pruning of random forests

arXiv.org Machine Learning

Random forests perform bootstrap-aggregation by sampling the training samples with replacement. This enables the evaluation of out-of-bag error which serves as a internal cross-validation mechanism. Our motivation lies in using the unsampled training samples to improve each decision tree in the ensemble. We study the effect of using the out-of-bag samples to improve the generalization error first of the decision trees and second the random forest by post-pruning. A preliminary empirical study on four UCI repository datasets show consistent decrease in the size of the forests without considerable loss in accuracy.


Using SparkML to Power a DSaaS (Data Science as a Service)

#artificialintelligence

Almost all organizations now have a need for datascience and as such the main challenge after determining the algorithm is to scale it up and make it operational. In this talk we will show how many common use cases use the common algorithms like Logistic Regression, Random Forest, Decision Trees, Clustering, NLP etc. Spark has several Machine Learning algorithms built in and has excellent scalability. Hence we at comcast built a platform to provide DSaaS on top of Spark with REST API as a means of controlling and submitting jobs so as to abstract most users from the rigor of writing(repeating) code instead focusing on the actual requirements. We will show how we solved some of the problems of establishing feature vectors, choosing algorithms and then deploying models into production. We will showcase our use of Scala, R and Python to implement models using language of choice yet deploying quickly into production on 500 node Spark clusters.