Machine-learned classifiers are important components of many data mining and knowledge discovery systems. In several application domains, an explanation of the classifier's reasoning is critical for the classifier's acceptance by the end-user. We describe a framework, ExplainD, for explaining decisions made by classifiers that use additive evidence. ExplainD applies to many widely used classifiers, including linear discriminants and many additive models. We demonstrate our ExplainD framework using implementations of naïve Bayes, linear support vector machine, and logistic regression classifiers on example applications. ExplainD uses a simple graphical explanation of the classification process to provide visualizations of the classifier decisions, visualization of the evidence for those decisions, the capability to speculate on the effect of changes to the data, and the capability, wherever possible, to drill down and audit the source of the evidence. We demonstrate the effectiveness of ExplainD in the context of a deployed web-based system (Proteome Analyst) and using a downloadable Python-based implementation.
PierreBand BC Cancer Agency 601 West 10th Ave, Epidemiology Vancouver BC Canada V5Z 1L3 Joel Bert Dept of Chemical Engineering University of British Columbia 2216 Main Mall Vancouver BC Canada V6T 1Z4 JohnGrace Dept of Chemical Engineering University of British Columbia 2216 Main Mall Vancouver BC Canada V6T 1Z4 Abstract Epidemiological data is traditionally analyzed with very simple techniques. Flexible models, such as neural networks, have the potential to discover unanticipated features in the data. However, to be useful, flexible models must have effective control on overfitting. Thispaper reports on a comparative study of the predictive quality of neural networks and other flexible models applied to real and artificial epidemiological data. The results suggest that there are no major unanticipated complex features in the real data, and also demonstrate that MacKay's  Bayesian neural network methodology provides effective control on overfitting while retaining theability to discover complex features in the artificial data. 1 Introduction Traditionally, very simple statistical techniques are used in the analysis of epidemiological studies.The predominant technique is logistic regression, in which the effects of predictors are linear (or categorical) and additive on the log-odds scale.
We study the performance of two representations of word meaning in learning noun-modifier semantic relations. One representation is based on lexical resources, in particular WordNet, the other - on a corpus. We experimented with decision trees, instance-based learning and Support Vector Machines. All these methods work well in this learning task. We report high precision, recall and F-score, and small variation in performance across several 10-fold cross-validation runs. The corpus-based method has the advantage of working with data without word-sense annotations and performs well over the baseline. The WordNet-based method, requiring wordsense annotated data, has higher precision.
Used to be you knew which city you were in from the food, the sports team, the historic sites, even the local brew. Now a team of microbiologists discovered they can tell cities apart by their unique bacterial fingerprints. The surprising finding was made after an intense study led by John Chase of Northern Arizona University's Department of Biological Sciences and Center for Microbial Genetics and Genomics. He and his colleagues spent a year swabbing for samples at nine offices in San Diego, Flagstaff, and Toronto. They wanted to find out what kind of impact factors like geography, location in a room, seasons, and human interaction have on the microbial communities we spread around, called microbiomes.
Daniel Saunders participated in the Insight Health Data Science program in the Fall of 2016, and currently works as a Data Scientist at Wayfair. Previously, Daniel was a postdoctoral fellow at the Center for Mind/Brain Sciences of the University of Trento, and received his PhD in Psychology from Queen's University. While at Insight, Daniel built an NLP-driven engine to generate stress impact scores for newspaper front pages, trained on the reactions of Facebook users to news story headlines. In this blog post, he describes his creative process in developing this project. For my Insight Health Data Science project, I wanted to tackle a problem related to mental health, since my Ph.D. is in Psychology and my father is a mental health advocate in British Columbia.