Diagnosis
Visualizing a Decision Tree - Machine Learning Recipes #2
Last episode, we treated our Decision Tree as a blackbox. In this episode, we'll build one on a real dataset, add code to visualize it, and practice reading it - so you can see how it works under the hood. And hey -- I may have gone a little fast through some parts. Just let me know, I'll slow down. Also: we'll do a Q&A episode down the road, so if anything is unclear, just ask! Follow https://twitter.com/random_forests
ONLamp.com: Building Decision Trees in Python
The decision tree in the figure is just one of many decision tree structures you could create to solve the marketing problem. The task of finding the optimal decision tree is an intractable problem. For those of you who have taken an analysis of algorithms course, you no doubt recognize this term. For those of you who haven't had this pleasure (he says, gritting his teeth), essentially what this means is that as the amount of test data used to train the decision tree grows, the amount of time it takes to do so grows as well--exponentially. While it may be nearly impossible to find the smallest (or more fittingly, the shallowest) decision tree in a respectable amount of time, it is possible to find a decision tree that is "small enough" using special heuristics.
Learning Decision Trees from Histogram Data Using Multiple Subsets of Bins
Gurung, Ram B. (Stockholm University) | Lindgren, Tony (Stockholm University) | Boström, Henrik (Stockholm University)
The standard approach of learning decision trees from histogram data is to treat the bins as independent variables. However, as the underlying dependencies among the bins might not be completely exploited by this approach, an algorithm has been proposed for learning decision trees from histogram data by considering all bins simultaneously while partitioning examples at each node of the tree. Although the algorithm has been demonstrated to improve predictive performance, its computational complexity has turned out to be a major bottleneck, in particular for histograms with a large number of bins. In this paper, we propose instead a sliding window approach to select subsets of the bins to be considered simultaneously while partitioning examples. This significantly reduces the number of possible splits to consider, allowing for substantially larger histograms to be handled. We also propose to evaluate the original bins independently, in addition to evaluating the subsets of bins when performing splits. This ensures that the information obtained by treating bins simultaneously is an additional gain compared to what is considered by the standard approach. Results of experiments on applying the new algorithm to both synthetic and real world datasets demonstrate positive results in terms of predictive performance without excessive computational cost.
Playing with Continuous uncertainty in Decision Trees • /r/MachineLearning
Classically, for decision trees we define a split or various "buckets" to transform continuous data into discrete data. The data I am currently processing has uncertainty associated with it (each data point comes from an aggregate set). As such, I might define a boundary- let's say N, where a data's uncertainty could place it in multiple buckets (say the parameter value N? Normally these boundaries are binary, but I was considering using the probability of these'overlapping instances' towards both buckets weighted by their respective probabilities. This doesn't seem to violate the entropy term (total probability will still sum to 1). However, I can't place half an instance within a branch- which would destroy the meaning behind the term.
Change Detection in Multivariate Datastreams: Likelihood and Detectability Loss
Alippi, Cesare, Boracchi, Giacomo, Carrera, Diego, Roveri, Manuel
We address the problem of detecting changes in multivariate datastreams, and we investigate the intrinsic difficulty that change-detection methods have to face when the data dimension scales. In particular, we consider a general approach where changes are detected by comparing the distribution of the log-likelihood of the datastream over different time windows. Despite the fact that this approach constitutes the frame of several change-detection methods, its effectiveness when data dimension scales has never been investigated, which is indeed the goal of our paper. We show that the magnitude of the change can be naturally measured by the symmetric Kullback-Leibler divergence between the pre- and post-change distributions, and that the detectability of a change of a given magnitude worsens when the data dimension increases. This problem, which we refer to as \emph{detectability loss}, is due to the linear relationship between the variance of the log-likelihood and the data dimension. We analytically derive the detectability loss on Gaussian-distributed datastreams, and empirically demonstrate that this problem holds also on real-world datasets and that can be harmful even at low data-dimensions (say, 10).
Paging Dr. Robot: The Coming AI Health Care Boom
More than six billion dollars: That's how much health care providers and consumers will be spending every year on artificial intelligence tools by 2021--a tenfold increase from today--according to a new report from research firm Frost & Sullivan. AI will be everywhere--from diagnosing cancer to providing weight-loss coaching, says Venkat Rajan, who has the great title of global director for the company's Visionary Healthcare Program. "Prior to 2015, most of what was happening was sort of academic: pilot programs, exploratory, proof of concept-type stuff," he says. AI's ability to sort through scads of information, and remember everything it has ever seen, could enable a digital (and congenial) version of Dr. House, the brilliant diagnostician from the eponymous TV show, says Rajan. "At first, it's a complete mystery, it could be one of ten different things," he says, about the process in the show, and real life, called differential diagnosis. "And then he's able to sort through various issues, you know, illuminate certain factors on why it's not one of these other conditions, and he's able to pull something from memory that figures out ultimately what it is, and they can provide the appropriate treatment." Robots won't steal doctors' jobs, says Rajan, but they will spare overworked docs some of the dangerous fatigue that can lead to mistakes.
Sparse Perceptron Decision Tree for Millions of Dimensions
Liu, Weiwei (University of Technology) | Tsang, Ivor W. (University of Technology)
Due to the nonlinear but highly interpretable representations,decision tree (DT) models have significantly attracted a lot of attention of researchers. However, DT models usually suffer from the curse of dimensionality and achieve degenerated performance when there are many noisy features. To address these issues, this paper first presents a novel data-dependent generalization error bound for the perceptron decision tree(PDT), which provides the theoretical justification to learn a sparse linear hyperplane in each decision node and to prune the tree. Following our analysis, we introduce the notion of sparse perceptron decision node (SPDN) with a budget constraint on the weight coefficients, and propose a sparse perceptron decision tree (SPDT) algorithm to achieve nonlinear prediction performance. To avoid generating an unstable and complicated decision tree and improve the generalization of the SPDT, we present a pruning strategy by learning classifiers to minimize cross-validation errors on each SPDN. Extensive empirical studies verify that our SPDT is more resilient to noisy features and effectively generates a small,yet accurate decision tree. Compared with state-of-the-art DT methods and SVM, our SPDT achieves better generalization performance on ultrahigh dimensional problems with more than 1 million features.
Causal Explanation Under Indeterminism: A Sampling Approach
Merck, Christopher A. (Stevens Institute of Technology) | Kleinberg, Samantha (Stevens Institute of Technology)
One of the key uses of causes is to explain why things happen. Explanations of specific events, like an individual's heart attack on Monday afternoon or a particular car accident, help assign responsibility and inform our future decisions. Computational methods for causal inference make use of the vast amounts of data collected by individuals to better understand their behavior and improve their health. However, most methods for explanation of specific events have provided theoretical approaches with limited applicability. In contrast we make two main contributions: an algorithm for explanation that calculates the strength of token causes, and an evaluation based on simulated data that enables objective comparison against prior methods and ground truth. We show that the approach finds the correct relationships in classic test cases (causal chains, common cause, and backup causation) and in a realistic scenario (explaining hyperglycemic episodes in a simulation of type 1 diabetes).
Automated Verification and Tightening of Failure Propagation Models
Bittner, Benjamin (Fondazione Bruno Kessler) | Bozzano, Marco (Fondazione Bruno Kessler) | Cimatti, Alessandro (Fondazione Bruno Kessler) | Zampedri, Gianni (Fondazione Bruno Kessler)
Timed Failure Propagation Graphs (TFPGs) are used in the design of safety-critical systems as a way of modeling failure propagation, and to evaluate and implement diagnostic systems. TFPGs are a very rich formalism: they allow to model Boolean combinations of faults and events, also dependent on the operational modes of the system and quantitative delays between them. TFPGs are often produced manually, from a given dynamic system of greater complexity, as abstract representations of the system behavior under specific faulty conditions. In this paper we tackle two key difficulties in this process: first, how to make sure that no important behavior of the system is overlooked in the TFPG, and that no spurious, non-existent behavior is introduced; second, how to devise the correct values for the delays between events. We propose a model checking approach to automatically validate the completeness and tightness of a TFPG for a given infinite-state dynamic system, and a procedure for the automated synthesis of the delay parameters. The proposed approach is evaluated on a number of synthetic and industrial benchmarks.