Decision Tree Learning
Ensembled Correlation Between Liver Analysis Outputs
Seker, Sadi Evren, Unal, Y., Erdem, Z., Kocer, H. Erdinc
Data mining techniques on the biological analysis are spreading for most of the areas including the health care and medical information. We have applied the data mining techniques, such as KNN, SVM, MLP or decision trees over a unique dataset, which is collected from 16,380 analysis results for a year. Furthermore we have also used meta-classifiers to question the increased correlation rate between the liver disorder and the liver analysis outputs. The results show that there is a correlation among ALT, AST, Billirubin Direct and Billirubin Total down to 15% of error rate. Also the correlation coefficient is up to 94%. This makes possible to predict the analysis results from each other or disease patterns can be applied over the linear correlation of the parameters.
Regression-tree Tuning in a Streaming Setting
Kpotufe, Samory, Orabona, Francesco
We consider the problem of maintaining the data-structures of a partition-based regression procedure in a setting where the training data arrives sequentially over time. We prove that it is possible to maintain such a structure in time $O(\log n)$ at any time step $n$ while achieving a nearly-optimal regression rate of $\tilde{O}(n^{-2/(2+d)})$ in terms of the unknown metric dimension $d$. Finally we prove a new regression lower-bound which is independent of a given data size, and hence is more appropriate for the streaming setting.
Understanding variable importances in forests of randomized trees
Louppe, Gilles, Wehenkel, Louis, Sutera, Antonio, Geurts, Pierre
Despite growing interest and practical use in various scientific areas, variable importances derived from tree-based ensemble methods are not well understood from a theoretical point of view. In this work we characterize the Mean Decrease Impurity (MDI) variable importances as measured by an ensemble of totally randomized trees in asymptotic sample and ensemble size conditions. We derive a three-level decomposition of the information jointly provided by all input variables about the output in terms of i) the MDI importance of each input variable, ii) the degree of interaction of a given input variable with the other input variables, iii) the different interaction terms of a given degree. We then show that this MDI importance of a variable is equal to zero if and only if the variable is irrelevant and that the MDI importance of a relevant variable is invariant with respect to the removal or the addition of irrelevant variables. We illustrate these properties on a simple example and discuss how they may change in the case of non-totally randomized trees such as Random Forests and Extra-Trees.
Decision Jungles: Compact and Rich Models for Classification
Shotton, Jamie, Sharp, Toby, Kohli, Pushmeet, Nowozin, Sebastian, Winn, John, Criminisi, Antonio
Randomized decision trees and forests have a rich history in machine learning and have seen considerable success in application, perhaps particularly so for computer vision. However, they face a fundamental limitation: given enough data, the number of nodes in decision trees will grow exponentially with depth. For certain applications, for example on mobile or embedded processors, memory is a limited resource, and so the exponential growth of trees limits their depth, and thus their potential accuracy. This paper proposes decision jungles, revisiting the idea of ensembles of rooted decision directed acyclic graphs (DAGs), and shows these to be compact and powerful discriminative models for classification. Unlike conventional decision trees that only allow one path to every node, a DAG in a decision jungle allows multiple paths from the root to each leaf. We present and compare two new node merging algorithms that jointly optimize both the features and the structure of the DAGs efficiently. During training, node splitting and node merging are driven by the minimization of exactly the same objective function, here the weighted sum of entropies at the leaves. Results on varied datasets show that, compared to decision forests and several other baselines, decision jungles require dramatically less memory while considerably improving generalization.
Narrowing the Gap: Random Forests In Theory and In Practice
Denil, Misha, Matheson, David, de Freitas, Nando
Despite widespread interest and practical use, the theoretical properties of random forests are still not well understood. In this paper we contribute to this understanding in two ways. We present a new theoretically tractable variant of random regression forests and prove that our algorithm is consistent. We also provide an empirical evaluation, comparing our algorithm and other theoretically tractable random forest models to the random forest algorithm used in practice. Our experiments provide insight into the relative importance of different simplifications that theoreticians have made to obtain tractable models for analysis.
Random Forests on Distance Matrices for Imaging Genetics Studies
Sim, Aaron, Tsagkrasoulis, Dimosthenis, Montana, Giovanni
The clinical pathology of neurological diseases and the imaging of the human brain are two areas of research that have largely developed along independent lines. It is only in the past few years that the usefulness of noninvasive imaging measurements of the human brain to the diagnosis and early prediction of neurological diseases been widely recognised (Albert et al., 2011; Sperling et al., 2011; Gray et al., 2013). In Alzheimer's Disease (AD), for instance, clinical guidance on the diagnosis of this most common of neurological degenerative disorders has recently been updated to incorporate neuroimaging markers alongside standard cognitive and behavioural tests (Albert et al., 2011; Sperling et al., 2011). The key to the improved characterisation of AD lies in the quantitative nature of the imaging measurements compared to the relatively subjective and imprecise nature of traditional clinical assessments. Imaging biomarkers of cerebral atrophy and of loss of connectivity between key regions in the brain are believed to be reliable indicators of AD and are particularly useful at early disease stages when standard cognitive assessments can be inconclusive. The utility of imaging phenotypes extends beyond diagnosis and prediction to the search for the underlying genetic factors behind neurological disorders (Stein et al., 2010). This comparatively more recent use of neuroimaging measurements in place of case-control labels in genetic association studies defines the emerging field of imaging genetics. The central premise here is that, should they exist, genetic associations to intermediate brain structure and brain function phenotypes are stronger than those with the categorical clinical disease statuses further down the etiological chain (Glahn et al., 2007). Again, the example of AD serves as a good illustration.
Top-down particle filtering for Bayesian decision trees
Lakshminarayanan, Balaji, Roy, Daniel M., Teh, Yee Whye
Decision tree learning is a popular approach for classification and regression in machine learning and statistics, and Bayesian formulations---which introduce a prior distribution over decision trees, and formulate learning as posterior inference given data---have been shown to produce competitive performance. Unlike classic decision tree learning algorithms like ID3, C4.5 and CART, which work in a top-down manner, existing Bayesian algorithms produce an approximation to the posterior distribution by evolving a complete tree (or collection thereof) iteratively via local Monte Carlo modifications to the structure of the tree, e.g., using Markov chain Monte Carlo (MCMC). We present a sequential Monte Carlo (SMC) algorithm that instead works in a top-down manner, mimicking the behavior and speed of classic algorithms. We demonstrate empirically that our approach delivers accuracy comparable to the most popular MCMC method, but operates more than an order of magnitude faster, and thus represents a better computation-accuracy tradeoff.
Knowledge Compilation for Model Counting: Affine Decision Trees
Koriche, Frédéric (CRIL-CNRS and Université d'Artois) | Lagniez, Jean-Marie (FMV, Johannes Kepler University) | Marquis, Pierre (CRIL-CNRS and Université d'Artois) | Thomas, Samuel (CRIL-CNRS and Université d'Artois)
Counting the models of a propositional formula is a key issue for a number of AI problems, but few propositional languages offer the possibility to count models efficiently. In order to fill the gap, we introduce the language EADT of (extended) affine decision trees. An extended affine decision tree simply is a tree with affine decision nodes and some specific decomposable conjunction or disjunction nodes. Unlike standard decision trees, the decision nodes of an EADT formula are not labeled by variables but by affine clauses. We study EADT, and several subsets of it along the lines of the knowledge compilation map. We also describe a CNF-to-EADT compiler and present some experimental results. Those results show that the EADT compilation-based approach is competitive with (and in some cases is able to outperform) the model counter Cachet and the d-DNNF compilation-based approach to model counting.
Knowledge Matters: Importance of Prior Information for Optimization
Gülçehre, Çağlar, Bengio, Yoshua
We explore the effect of introducing prior information into the intermediate level of neural networks for a learning task on which all the state-of-the-art machine learning algorithms tested failed to learn. We motivate our work from the hypothesis that humans learn such intermediate concepts from other individuals via a form of supervision or guidance using a curriculum. The experiments we have conducted provide positive evidence in favor of this hypothesis. In our experiments, a two-tiered MLP architecture is trained on a dataset with 64x64 binary inputs images, each image with three sprites. The final task is to decide whether all the sprites are the same or one of them is different. Sprites are pentomino tetris shapes and they are placed in an image with different locations using scaling and rotation transformations. The first part of the two-tiered MLP is pre-trained with intermediate-level targets being the presence of sprites at each location, while the second part takes the output of the first part as input and predicts the final task's target binary event. The two-tiered MLP architecture, with a few tens of thousand examples, was able to learn the task perfectly, whereas all other algorithms (include unsupervised pre-training, but also traditional algorithms like SVMs, decision trees and boosting) all perform no better than chance. We hypothesize that the optimization difficulty involved when the intermediate pre-training is not performed is due to the {\em composition} of two highly non-linear tasks. Our findings are also consistent with hypotheses on cultural learning inspired by the observations of optimization problems with deep learning, presumably because of effective local minima.
Machine Learning Techniques for Diagnostic Differentiation of Mild Cognitive Impairment and Dementia
Williams, Jennifer A. (Washington State University) | Weakley, Alyssa (Washington State University) | Cook, Diane J. (Washington State University) | Schmitter-Edgecombe, Maureen (Washington State University)
Detection of cognitive impairment, especially at the early stages, is critical. Such detection has traditionally been performed manually by one or more clinicians based on reports and test results. Machine learning algorithms offer an alternative method of detection that may provide an automated process and valuable insights into diagnosis and classification. In this paper, we explore the use of neuropsychological and demographic data to predict Clinical Dementia Rating (CDR) scores (no dementia, very mild dementia, dementia) and clinical diagnoses (cognitively healthy, mild cognitive impairment, dementia) through the implementation of four machine learning algorithms, naïve Bayes (NB), C4.5 decision tree (DT), back-propagation neural network (NN), and support vector machine (SVM). Additionally, a feature selection method for reducing the number of neuropsychological and demographic data needed to make an accurate diagnosis was investigated. The NB classifier provided the best accuracies, while the SVM classifier proved to offer some of the lowest accuracies. We also illustrate that with the use of feature selection, accuracies can be improved. The experiments reported in this paper indicate that artificial intelligence techniques can be used to automate aspects of clinical diagnosis of individuals with cognitive impairment.