Goto

Collaborating Authors

 Decision Tree Learning


Learn Machine Learning with Weka Udemy

#artificialintelligence

This is the bite size course to learn Weka and Machine Learning. You will learn Machine Learning which is the Model and Evaluation of CRISP Data Mining Process. You will learn Linear Regression, Kmeans Clustering, Agglomeration Clustering, KNN, Naive Bayes, Neural Network in this course.


Pro Machine Learning Algorithms [PDF] - Programmer Books

#artificialintelligence

Bridge the gap between a high-level understanding of how an algorithm works and knowing the nuts and bolts to tune your models better. This book will give you the confidence and skills when developing all the major machine learning models. In Pro Machine Learning Algorithms, you will first develop the algorithm in Excel so that you get a practical understanding of all the levers that can be tuned in a model, before implementing the models in Python/R. You will cover all the major algorithms: supervised and unsupervised learning, which include linear/logistic regression; k-means clustering; PCA; recommender system; decision tree; random forest; GBM; and neural networks. You will also be exposed to the latest in deep learning through CNNs, RNNs, and word2vec for text mining.


Decision Tree (CART) - Machine Learning Fun and Easy

#artificialintelligence

Decision Tree (CART) - Machine Learning Fun and Easy https://www.udemy.com/machine-learnin... Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. A tree has many analogies in real life, and turns out that it has influenced a wide area of machine learning, covering both classification and regression (CART). So a decision tree is a flow-chart-like structure, where each internal node denotes a test on an attribute, each branch represents the outcome of a test, and each leaf (or terminal) node holds a class label. The topmost node in a tree is the root node. To learn more on Augmented Reality, IoT, Machine Learning FPGAs, Arduinos, PCB Design and Image Processing then Check out http://www.arduinostartups.com/


Making Sense of Random Forest Probabilities: a Kernel Perspective

arXiv.org Machine Learning

A random forest is a popular tool for estimating probabilities in machine learning classification tasks. However, the means by which this is accomplished is unprincipled: one simply counts the fraction of trees in a forest that vote for a certain class. In this paper, we forge a connection between random forests and kernel regression. This places random forest probability estimation on more sound statistical footing. As part of our investigation, we develop a model for the proximity kernel and relate it to the geometry and sparsity of the estimation problem. We also provide intuition and recommendations for tuning a random forest to improve its probability estimates.


Machine Learning in Official Statistics

arXiv.org Machine Learning

On 10 October 2017, the development of a Digital Agenda of the Federal Statistical Office of Germany (Destatis) has started (Statistisches Bundesamt 2018). One of many topics that were intensively discussed was Machine Learning. In a meeting at 13-15 November 2017, the office and department heads of Destatis evaluated and prioritised 59 measures of the Digital Agenda according to their benefits and costs. A "Proof of Concept Machine Learning" was given high priority and classified as one of four lighthouse projects of the Digital Agenda. The content specification was "Proof of Concept Machine Learning - Set up Proof of Concept for Machine Learning, e.g. in business statistics, to perform automatic categorization and improve analysis potential". The deadline for completion of the project was set for mid-2018.


Can we learn where people go?

arXiv.org Machine Learning

In most agent-based simulators, pedestrians navigate from origins to destinations. Consequently, destinations are essential input parameters to the simulation. While many other relevant parameters as positions, speeds and densities can be obtained from sensors, like cameras, destinations cannot be observed directly. Our research question is: Can we obtain this information from video data using machine learning methods? We use density heatmaps, which indicate the pedestrian density within a given camera cutout, as input to predict the destination distributions. For our proof of concept, we train a Random Forest predictor on an exemplary data set generated with the VADERE microscopic simulator. The scenario is a crossroad where pedestrians can head left, straight or right. In addition, we gain first insights on suitable placement of the camera. The results motivate an in-depth analysis of the methodology.


MLIC: A MaxSAT-Based framework for learning interpretable classification rules

arXiv.org Artificial Intelligence

The wide adoption of machine learning approaches in the industry, government, medicine and science has renewed the interest in interpretable machine learning: many decisions are too important to be delegated to black-box techniques such as deep neural networks or kernel SVMs. Historically, problems of learning interpretable classifiers, including classification rules or decision trees, have been approached by greedy heuristic methods as essentially all the exact optimization formulations are NP-hard. Our primary contribution is a MaxSAT-based framework, called MLIC, which allows principled search for interpretable classification rules expressible in propositional logic. Our approach benefits from the revolutionary advances in the constraint satisfaction community to solve large-scale instances of such problems. In experimental evaluations over a collection of benchmarks arising from practical scenarios, we demonstrate its effectiveness: we show that the formulation can solve large classification problems with tens or hundreds of thousands of examples and thousands of features, and to provide a tunable balance of accuracy vs. interpretability. Furthermore, we show that in many problems interpretability can be obtained at only a minor cost in accuracy. The primary objective of the paper is to show that recent advances in the MaxSAT literature make it realistic to find optimal (or very high quality near-optimal) solutions to large-scale classification problems. The key goal of the paper is to excite researchers in both interpretable classification and in the CP community to take it further and propose richer formulations, and to develop bespoke solvers attuned to the problem of interpretable ML.


An empirical study on hyperparameter tuning of decision trees

arXiv.org Machine Learning

Machine learning algorithms often contain many hyperparameters whose values affect the predictive performance of the induced models in intricate ways. Due to the high number of possibilities for these hyperparameter configurations, and their complex interactions, it is common to use optimization techniques to find settings that lead to high predictive accuracy. However, we lack insight into how to efficiently explore this vast space of configurations: which are the best optimization techniques, how should we use them, and how significant is their effect on predictive or runtime performance? This paper provides a comprehensive approach for investigating the effects of hyperparameter tuning on three Decision Tree induction algorithms, CART, C4.5 and CTree. These algorithms were selected because they are based on similar principles, have presented a high predictive performance in several previous works and induce interpretable classification models. Additionally, they contain many interacting hyperparameters to be adjusted. Experiments were carried out with different tuning strategies to induce models and evaluate the relevance of hyperparameters using 94 classification datasets from OpenML. Experimental results indicate that hyperparameter tuning provides statistically significant improvements for C4.5 and CTree in only one-third of the datasets, and in most of the datasets for CART. Different tree algorithms may present different tuning scenarios, but in general, the tuning techniques required relatively few iterations to find accurate solutions. Furthermore, the best technique for all the algorithms was the Irace. Finally, we find that tuning a specific small subset of hyperparameters contributes most of the achievable optimal predictive performance.


Please Stop Explaining Black Box Models for High Stakes Decisions

arXiv.org Machine Learning

Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to explain black box models, rather than creating models that are interpretable in the first place, is likely to perpetuate bad practices and can potentially cause catastrophic harm to society. There is a way forward - it is to design models that are inherently interpretable.


r/MachineLearning - [D] Regression Decision Tree from Scratch

#artificialintelligence

I'm looking for an implementation of a Regression Tree from scratch and have only been able to find classification trees.