Diagnosis
Implementing Troubleshooting with Batch Repair
Stern, Roni (Ben Gurion University of the Negev) | Kalech, Meir (Ben Gurion University of the Negev) | Shinitzky, Hilla (Ben Gurion University of the Negev)
Recent work has raised the challenge of efficient automated troubleshooting in domains where repairing a set of components in a single repair action is cheaper than repairing each of them separately. This corresponds to cases where there is a non-negligible overhead to initiating a repair action and to testing the system after a repair action. In this work we propose several algorithms for choosing which batch of components to repair, so as to minimize the overall repair costs. Experimentally, we show the benefit of these algorithms over repairing components one at a time.
Visualizing a Decision Tree - Machine Learning Recipes #2
Last episode, we treated our Decision Tree as a blackbox. In this episode, we'll build one on a real dataset, add code to visualize it, and practice reading it - so you can see how it works under the hood. And hey -- I may have gone a little fast through some parts. Just let me know, I'll slow down. Also: we'll do a Q&A episode down the road, so if anything is unclear, just ask! Subscribe to the Google Developers: http://goo.gl/mQyv5L
Confidence Decision Trees via Online and Active Learning for Streaming (BIG) Data
Decision tree classifiers are a widely used tool in data stream mining. The use of confidence intervals to estimate the gain associated with each split leads to very effective methods, like the popular Hoeffding tree algorithm. From a statistical viewpoint, the analysis of decision tree classifiers in a streaming setting requires knowing when enough new information has been collected to justify splitting a leaf. Although some of the issues in the statistical analysis of Hoeffding trees have been already clarified, a general and rigorous study of confidence intervals for splitting criteria is missing. We fill this gap by deriving accurate confidence intervals to estimate the splitting gain in decision tree learning with respect to three criteria: entropy, Gini index, and a third index proposed by Kearns and Mansour. Our confidence intervals depend in a more detailed way on the tree parameters. We also extend our confidence analysis to a selective sampling setting, in which the decision tree learner adaptively decides which labels to query in the stream. We furnish theoretical guarantee bounding the probability that the classification is non-optimal learning the decision tree via our selective sampling strategy. Experiments on real and synthetic data in a streaming setting show that our trees are indeed more accurate than trees with the same number of leaves generated by other techniques and our active learning module permits to save labeling cost. In addition, comparing our labeling strategy with recent methods, we show that our approach is more robust and consistent respect all the other techniques applied to incremental decision trees.
Parallel Model-Based Diagnosis on Multi-Core Computers
Jannach, Dietmar, Schmitz, Thomas, Shchekotykhin, Kostyantyn
Model-Based Diagnosis (MBD) is a principled and domain-independent way of analyzing why a system under examination is not behaving as expected. Given an abstract description (model) of the system's components and their behavior when functioning normally, MBD techniques rely on observations about the actual system behavior to reason about possible causes when there are discrepancies between the expected and observed behavior. Due to its generality, MBD has been successfully applied in a variety of application domains over the last decades. In many application domains of MBD, testing different hypotheses about the reasons for a failure can be computationally costly, e.g., because complex simulations of the system behavior have to be performed. In this work, we therefore propose different schemes of parallelizing the diagnostic reasoning process in order to better exploit the capabilities of modern multi-core computers. We propose and systematically evaluate parallelization schemes for Reiter's hitting set algorithm for finding all or a few leading minimal diagnoses using two different conflict detection techniques. Furthermore, we perform initial experiments for a basic depth-first search strategy to assess the potential of parallelization when searching for one single diagnosis. Finally, we test the effects of parallelizing "direct encodings" of the diagnosis problem in a constraint solver.
Preface: The Beyond NP Workshop
Darwiche, Adnan (University of California, Los Angeles) | Marquest-Silva, Joao (University of Lisbon) | Marquis, Pierre (Universitรฉ dโArtois)
A new computational paradigm has emerged in computer both Renault and Toyota have deployed online configuration science over the past few decades, which is exemplified by systems based on knowledge compilation). QBF solvers the use of SAT solvers to tackle problems in the complexity have been used in model checking, verification, debugging, class NP. Finally, function problem solvers have and engineering investment is made towards developing been used in model-based diagnosis, design debugging, highly efficient solvers for a prototypical problem CAD and bioinformatics. The cost of this investment is then on a variety of topics, including algorithms; descriptions amortized as these solvers are applied to a broader class of of implementations and/or evaluations of beyond NP problems via reductions (in contrast to developing dedicated solvers; their applications (including encodings); the complexity algorithms for each encountered problem). SAT solvers, classes they reach; and their connections to one for example, are now routinely used to solve problems in another.
How to Bin or Convert Numerical Variables to Categorical Variables with Decision Trees
This is a guest repost by Jacob Joseph from CleverTap. Why would you want to convert a numerical variable into categorical one? Depending on the situation, it can lead to a better interpretation of the numerical variable, quick segmentation or just an additional feature for building your predictive model by creating bins for the numerical variable. Binning is a popular feature engineering technique. Suppose your hypothesis is that the age of a customer is correlated with their tendency to interact with a mobile app.
Decision Trees - Introduction
Decision trees are simple and powerful types of multiple variable analysis. Decision trees are produced by algorithms that identify various ways of splitting a data set into branch-like segments. These segments form an inverted decision tree that originates with a root node at the top of the tree. The object of analysis is reflected in this root node as a simple, one-dimensional display in the decision tree interface. The name of the field of data that is the object of analysis is usually displayed, along with the spread or distribution of the values that are contained in that field.
Decision tree vs. linearly separable or non-separable pattern
As a part of a series of posts discussing how a machine learning classifier works, I ran decision tree to classify a XY-plane, trained with XOR patterns or linearly separable patterns. Its decision boundary was drawn almost perfectly parallel to the assumed true boundary, i.e. Awful result, it appears to never follow the true boundary. Just a little improved, but it still appears to be overfitted. Even worse... it appears to get more overfitted than the case of 2-classes.
Study: Smartphone app that listens to breathing, determines respiratory diseases is 89 percent accurate
A smartphone-based system for diagnosing respiratory diseases achieved an accuracy of 89 percent in a recent clinical study of 524 pediatric patients conducted by the company at Joondalup Health Campus (JHC) and Princess Margaret Hospital (PMH) in Perth, Western Australia. Perth-based ResApp essentially uses the smartphone microphone as a stethoscope to listen to a patient's breathing. But instead of relying solely on a doctor's ears to form a diagnosis from those sounds, ResApp has been developing machine-learning algorithms that will automatically determine which respiratory condition a patient might have, including pneumonia, asthma, bronchiolitis and COPD. In the future, the company hopes to integrate those algorithms into telehealth offerings as well as making them available for clinical use. ResApp released data from this trial previously in November, but that data set included fewer patients.
Paging Dr. Robot: The Coming AI Health Care Boom
More than six billion dollars: That's how much health care providers and consumers will be spending every year on artificial intelligence tools by 2021--a tenfold increase from today--according to a new report from research firm Frost & Sullivan. AI will be everywhere--from diagnosing cancer to providing weight-loss coaching, says Venkat Rajan, who has the great title of global director for the company's Visionary Healthcare Program. "Prior to 2015, most of what was happening was sort of academic: pilot programs, exploratory, proof of concept-type stuff," he says. AI's ability to sort through scads of information, and remember everything it has ever seen, could enable a digital (and congenial) version of Dr. House, the brilliant diagnostician from the eponymous TV show, says Rajan. "At first, it's a complete mystery, it could be one of ten different things," he says, about the process in the show, and real life, called differential diagnosis. "And then he's able to sort through various issues, you know, illuminate certain factors on why it's not one of these other conditions, and he's able to pull something from memory that figures out ultimately what it is, and they can provide the appropriate treatment."