Goto

Collaborating Authors

 Decision Tree Learning


Learning Nonlinear Functions Using Regularized Greedy Forest

arXiv.org Machine Learning

We consider the problem of learning a forest of nonlinear decision rules with general loss functions. The standard methods employ boosted decision trees such as Adaboost for exponential loss and Friedman's gradient boosting for general loss. In contrast to these traditional boosting algorithms that treat a tree learner as a black box, the method we propose directly learns decision forests via fully-corrective regularized greedy search using the underlying forest structure. Our method achieves higher accuracy and smaller models than gradient boosting (and Adaboost with exponential loss) on many datasets.


Layered Logic Classifiers: Exploring the `And' and `Or' Relations

arXiv.org Machine Learning

Designing effective and efficient classifier for pattern analysis is a key problem in machine learning and computer vision. Many the solutions to the problem require to perform logic operations such as `and', `or', and `not'. Classification and regression tree (CART) include these operations explicitly. Other methods such as neural networks, SVM, and boosting learn/compute a weighted sum on features (weak classifiers), which weakly perform the 'and' and 'or' operations. However, it is hard for these classifiers to deal with the 'xor' pattern directly. In this paper, we propose layered logic classifiers for patterns of complicated distributions by combining the `and', `or', and `not' operations. The proposed algorithm is very general and easy to implement. We test the classifiers on several typical datasets from the Irvine repository and two challenging vision applications, object segmentation and pedestrian detection. We observe significant improvements on all the datasets over the widely used decision stump based AdaBoost algorithm. The resulting classifiers have much less training complexity than decision tree based AdaBoost, and can be applied in a wide range of domains.


Effects of Sampling Methods on Prediction Quality. The Case of Classifying Land Cover Using Decision Trees

arXiv.org Machine Learning

Clever sampling methods can be used to improve the handling of big data and increase its usefulness. The subject of this study is remote sensing, specifically airborne laser scanning point clouds representing different classes of ground cover. The aim is to derive a supervised learning model for the classification using CARTs. In order to measure the effect of different sampling methods on the classification accuracy, various experiments with varying types of sampling methods, sample sizes, and accuracy metrics have been designed. Numerical results for a subset of a large surveying project covering the lower Rhine area in Germany are shown. General conclusions regarding sampling design are drawn and presented.


A consistent deterministic regression tree for non-parametric prediction of time series

arXiv.org Machine Learning

We study online prediction of bounded stationary ergodic processes. To do so, we consider the setting of prediction of individual sequences and build a deterministic regression tree that performs asymptotically as well as the best L-Lipschitz constant predictors. Then, we show why the obtained regret bound entails the asymptotical optimality with respect to the class of bounded stationary ergodic processes.


Automated Classification of Airborne Laser Scanning Point Clouds

arXiv.org Artificial Intelligence

Making sense of the physical world has always been at the core of mapping. Up until recently, this has always dependent on using the human eye. Using airborne lasers, it has become possible to quickly "see" more of the world in many more dimensions. The resulting enormous point clouds serve as data sources for applications far beyond the original mapping purposes ranging from flooding protection and forestry to threat mitigation. In order to process these large quantities of data, novel methods are required. In this contribution, we develop models to automatically classify ground cover and soil types. Using the logic of machine learning, we critically review the advantages of supervised and unsupervised methods. Focusing on decision trees, we improve accuracy by including beam vector components and using a genetic algorithm. We find that our approach delivers consistently high quality classifications, surpassing classical methods.


Power System Parameters Forecasting Using Hilbert-Huang Transform and Machine Learning

arXiv.org Machine Learning

A novel hybrid data-driven approach is developed for forecasting power system parameters with the goal of increasing the efficiency of short-term forecasting studies for non-stationary time-series. The proposed approach is based on mode decomposition and a feature analysis of initial retrospective data using the Hilbert-Huang transform and machine learning algorithms. The random forests and gradient boosting trees learning techniques were examined. The decision tree techniques were used to rank the importance of variables employed in the forecasting models. The Mean Decrease Gini index is employed as an impurity function. The resulting hybrid forecasting models employ the radial basis function neural network and support vector regression. Apart from introduction and references the paper is organized as follows. The section 2 presents the background and the review of several approaches for short-term forecasting of power system parameters. In the third section a hybrid machine learning-based algorithm using Hilbert-Huang transform is developed for short-term forecasting of power system parameters. Fourth section describes the decision tree learning algorithms used for the issue of variables importance. Finally in section six the experimental results in the following electric power problems are presented: active power flow forecasting, electricity price forecasting and for the wind speed and direction forecasting.


Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife

arXiv.org Machine Learning

We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2012) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B on the order of n^{1.5} bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B on the order of n replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.


Non-uniform Feature Sampling for Decision Tree Ensembles

arXiv.org Machine Learning

We study the effectiveness of non-uniform randomized feature selection in decision tree classification. We experimentally evaluate two feature selection methodologies, based on information extracted from the provided dataset: $(i)$ \emph{leverage scores-based} and $(ii)$ \emph{norm-based} feature selection. Experimental evaluation of the proposed feature selection techniques indicate that such approaches might be more effective compared to naive uniform feature selection and moreover having comparable performance to the random forest algorithm [3]


The Random Forest Kernel and other kernels for big data from random partitions

arXiv.org Machine Learning

We present Random Partition Kernels, a new class of kernels derived by demonstrating a natural connection between random partitions of objects and kernels between those objects. We show how the construction can be used to create kernels from methods that would not normally be viewed as random partitions, such as Random Forest. To demonstrate the potential of this method, we propose two new kernels, the Random Forest Kernel and the Fast Cluster Kernel, and show that these kernels consistently outperform standard kernels on problems involving real-world datasets. Finally, we show how the form of these kernels lend themselves to a natural approximation that is appropriate for certain big data problems, allowing $O(N)$ inference in methods such as Gaussian Processes, Support Vector Machines and Kernel PCA.


Prediction with Missing Data via Bayesian Additive Regression Trees

arXiv.org Machine Learning

This article addresses prediction problems where covariate information is missing during model construction and is also missing in future observations for which we are obligated to generate a forecast. Our aim is to innovate a nonparametric statistical learning extension which incorporates missingness into both the training and the forecasting phases. In the spirit of nonparametric learning, we wish to incorporate the missingness in both phases automatically, without the need for pre-specified modeling. We limit our focus to tree-based statistical learning, which has demonstrated strong predictive performance and has consequently received considerable attention in recent years. State-of-the-art algorithms include Random Forests (RF, Breiman, 2001b), stochastic gradient boosting (Friedman, 2002), and Bayesian Additive and Regression Trees (BART, Chipman et al., 2010), the algorithm of interest in this study.