Visual Explanation of Evidence in Additive Classifiers

AAAI Conferences

Machine-learned classifiers are important components of many data mining and knowledge discovery systems. In several application domains, an explanation of the classifier's reasoning is critical for the classifier's acceptance by the end-user. We describe a framework, ExplainD, for explaining decisions made by classifiers that use additive evidence. ExplainD applies to many widely used classifiers, including linear discriminants and many additive models. We demonstrate our ExplainD framework using implementations of naïve Bayes, linear support vector machine, and logistic regression classifiers on example applications. ExplainD uses a simple graphical explanation of the classification process to provide visualizations of the classifier decisions, visualization of the evidence for those decisions, the capability to speculate on the effect of changes to the data, and the capability, wherever possible, to drill down and audit the source of the evidence. We demonstrate the effectiveness of ExplainD in the context of a deployed web-based system (Proteome Analyst) and using a downloadable Python-based implementation.


A Comparison between Neural Networks and other Statistical Techniques for Modeling the Relationship between Tobacco and Alcohol and Cancer

Neural Information Processing Systems

PierreBand BC Cancer Agency 601 West 10th Ave, Epidemiology Vancouver BC Canada V5Z 1L3 Joel Bert Dept of Chemical Engineering University of British Columbia 2216 Main Mall Vancouver BC Canada V6T 1Z4 JohnGrace Dept of Chemical Engineering University of British Columbia 2216 Main Mall Vancouver BC Canada V6T 1Z4 Abstract Epidemiological data is traditionally analyzed with very simple techniques. Flexible models, such as neural networks, have the potential to discover unanticipated features in the data. However, to be useful, flexible models must have effective control on overfitting. Thispaper reports on a comparative study of the predictive quality of neural networks and other flexible models applied to real and artificial epidemiological data. The results suggest that there are no major unanticipated complex features in the real data, and also demonstrate that MacKay's [1995] Bayesian neural network methodology provides effective control on overfitting while retaining theability to discover complex features in the artificial data. 1 Introduction Traditionally, very simple statistical techniques are used in the analysis of epidemiological studies.The predominant technique is logistic regression, in which the effects of predictors are linear (or categorical) and additive on the log-odds scale.


Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates

arXiv.org Machine Learning

Information-Theoretic Generalization Bounds for SGLD via Data-Dependent EstimatesJeffrey Negrea University of Toronto, V ector Institute Mahdi Haghifam University of Toronto, Element AI Gintare Karolina Dziugaite Element AI Ashish Khisti University of Toronto Daniel M. Roy University of Toronto, V ector Institute Abstract In this work, we improve upon the stepwise analysis of noisy iterative learning algorithms initiated by Pensia, Jog, and Loh (2018) and recently extended by Bu, Zou, and V eeravalli (2019). Our main contributions are significantly improved mutual information bounds for Stochastic Gradient Langevin Dynamics via data-dependent estimates. Our approach is based on the variational characterization of mutual information and the use of data-dependent priors that forecast the mini-batch gradient based on a subset of the training samples. Our approach is broadly applicable within the information-theoretic framework of Russo and Zou (2015) and Xu and Raginsky (2017). Our bound can be tied to a measure of flatness of the empirical risk surface. As compared with other bounds that depend on the squared norms of gradients, empirical investigations show that the terms in our bounds are orders of magnitude smaller. 1 Introduction Stochastic subgradient methods, especially stochastic gradient descent (SGD), are at the core of recent advances in deep-learning practice. Despite some progress, developing a precise understanding of generalization error for that class of algorithms remains wide open. Concurrently, there has been steady progress for noisy variants of SGD, such as stochastic gradient Langevin dynamics (SGLD) [13, 25, 33] and its full-batch counterpart, the Langevin algorithm [13]. The introduction of Gaussian noise to the iterates of SGD expands the set of theoretical frameworks that can be brought to bear on the study of generalization. In pioneering work, Raginsky, Rakhlin, and Telgarsky [25] exploit the fact that SGLD approximates Langevin diffusion, a continuous time Markov process, in the small step size limit. One drawback of this and related analyses involving Markov processes is the reliance on mixing. We hypothesize that SGLD is not mixing in practice, so results based upon mixing may not be representative of empirical performance. In recent work, Pensia, Jog, and Loh [23] perform a stepwise analysis of a family of noisy iterative algorithms that includes SGLD and the Langevin algorithm. At the foundation of this work is the framework of Russo and Zou [28] and Xu and Raginsky [34], where mean generalization error is controlled in terms of the mutual information between the dataset and the learned parameters.


School of Information Technology and Engineering University of Ottawa 800 King Edward Avenue Ottawa, Ontario K1N 6N5, Canada {vnastase,jsayyad }@site.uottawa.ca

AAAI Conferences

We study the performance of two representations of word meaning in learning noun-modifier semantic relations. One representation is based on lexical resources, in particular WordNet, the other - on a corpus. We experimented with decision trees, instance-based learning and Support Vector Machines. All these methods work well in this learning task. We report high precision, recall and F-score, and small variation in performance across several 10-fold cross-validation runs. The corpus-based method has the advantage of working with data without word-sense annotations and performs well over the baseline. The WordNet-based method, requiring wordsense annotated data, has higher precision.


Cities Have Unique Bacterial Fingerprints : DNews

#artificialintelligence

Used to be you knew which city you were in from the food, the sports team, the historic sites, even the local brew. Now a team of microbiologists discovered they can tell cities apart by their unique bacterial fingerprints. The surprising finding was made after an intense study led by John Chase of Northern Arizona University's Department of Biological Sciences and Center for Microbial Genetics and Genomics. He and his colleagues spent a year swabbing for samples at nine offices in San Diego, Flagstaff, and Toronto. They wanted to find out what kind of impact factors like geography, location in a room, seasons, and human interaction have on the microbial communities we spread around, called microbiomes.