Goto

Collaborating Authors

 Learning Graphical Models


Edge.org

#artificialintelligence

Perhaps the most important news of our day is that datasets--not algorithms--might be the key limiting factor to development of human-level artificial intelligence. At the dawn of the field of artificial intelligence, in 1967, two of its founders famously anticipated that solving the problem of computer vision would take only a summer. Now, almost a half century later, machine learning software finally appears poised to achieve human-level performance on vision tasks and a variety of other grand challenges. What took the AI revolution so long? A review of the timing of the most publicized AI advances over the past thirty years suggests a provocative explanation: perhaps many major AI breakthroughs have actually been constrained by the availability of high-quality training datasets, and not by algorithmic advances.


A Minimalistic Approach to Sum-Product Network Learning for Real Applications

arXiv.org Machine Learning

Sum-Product Networks (SPNs) are a class of expressive yet tractable hierarchical graphical models. LearnSPN is a structure learning algorithm for SPNs that uses hierarchical co-clustering to simultaneously identifying similar entities and similar features. The original LearnSPN algorithm assumes that all the variables are discrete and there is no missing data. We introduce a practical, simplified version of LearnSPN, MiniSPN, that runs faster and can handle missing data and heterogeneous features common in real applications. We demonstrate the performance of MiniSPN on standard benchmark datasets and on two datasets from Google's Knowledge Graph exhibiting high missingness rates and a mix of discrete and continuous features.


Semi-supervised Learning with Induced Word Senses for State of the Art Word Sense Disambiguation

Journal of Artificial Intelligence Research

Word Sense Disambiguation (WSD) aims to determine the meaning of a word in context, and successful approaches are known to benefit many applications in Natural Language Processing. Although supervised learning has been shown to provide superior WSD performance, current sense-annotated corpora do not contain a sufficient number of instances per word type to train supervised systems for all words. While unsupervised techniques have been proposed to overcome this data sparsity problem, such techniques have not outperformed supervised methods. In this paper, we propose a new approach to building semi-supervised WSD systems that combines a small amount of sense-annotated data with information from Word Sense Induction, a fully-unsupervised technique that automatically learns the different senses of a word based on how it is used. In three experiments, we show how sense induction models may be effectively combined to ultimately produce high-performance semi-supervised WSD systems that exceed the performance of state-of-the-art supervised WSD techniques trained on the same sense-annotated data. We anticipate that our results and released software will also benefit evaluation practices for sense induction systems and those working in low-resource languages by demonstrating how to quickly produce accurate WSD systems with minimal annotation effort.


Naive Bayes for Dummies; A Simple Explanation

@machinelearnbot

This blog post was originally published as part of an ongoing series, "Popular Algorithms Explained in Simple English" on the AYLIEN Text Analysis Blog. Commonly used in Machine Learning, Naive Bayes is a collection of classification algorithms based on Bayes Theorem. It is not a single algorithm but a family of algorithms that all share a common principle, that every feature being classified is independent of the value of any other feature. So for example, a fruit may be considered to be an apple if it is red, round, and about 3" in diameter. A Naive Bayes classifier considers each of these "features" (red, round, 3" in diameter) to contribute independently to the probability that the fruit is an apple, regardless of any correlations between features.


Robust Estimators in High Dimensions without the Computational Intractability

arXiv.org Machine Learning

We study high-dimensional distribution learning in an agnostic setting where an adversary is allowed to arbitrarily corrupt an $\varepsilon$-fraction of the samples. Such questions have a rich history spanning statistics, machine learning and theoretical computer science. Even in the most basic settings, the only known approaches are either computationally inefficient or lose dimension-dependent factors in their error guarantees. This raises the following question:Is high-dimensional agnostic distribution learning even possible, algorithmically? In this work, we obtain the first computationally efficient algorithms with dimension-independent error guarantees for agnostically learning several fundamental classes of high-dimensional distributions: (1) a single Gaussian, (2) a product distribution on the hypercube, (3) mixtures of two product distributions (under a natural balancedness condition), and (4) mixtures of spherical Gaussians. Our algorithms achieve error that is independent of the dimension, and in many cases scales nearly-linearly with the fraction of adversarially corrupted samples. Moreover, we develop a general recipe for detecting and correcting corruptions in high-dimensions, that may be applicable to many other problems.


Markov models for ocular fixation locations in the presence and absence of colour

arXiv.org Machine Learning

We propose to model the fixation locations of the human eye when observing a still image by a Markovian point process in R 2 . Our approach is data driven using k-means clustering of the fixation locations to identify distinct salient regions of the image, which in turn correspond to the states of our Markov chain. Bayes factors are computed as model selection criterion to determine the number of clusters. Furthermore, we demonstrate that the behaviour of the human eye differs from this model when colour information is removed from the given image.


Sparse group factor analysis for biclustering of multiple data sources

arXiv.org Machine Learning

Motivation: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments. These biclustering techniques have focused on one data source, often gene expression data. We present a Bayesian approach for joint biclustering of multiple data sources, extending a recent method Group Factor Analysis (GFA) to have a biclustering interpretation with additional sparsity assumptions. The resulting method enables data-driven detection of linear structure present in parts of the data sources. Results: Our simulation studies show that the proposed method reliably infers bi-clusters from heterogeneous data sources. We tested the method on data from the NCI-DREAM drug sensitivity prediction challenge, resulting in an excellent prediction accuracy. Moreover, the predictions are based on several biclusters which provide insight into the data sources, in this case on gene expression, DNA methylation, protein abundance, exome sequence, functional connectivity fingerprints and drug sensitivity.


Bayes' Theorem And Robot Arms Open Data Science Conferences

#artificialintelligence

If you enjoyed Jesse's presentation at ODSC's last Boston Big Data Conference come to ODSC East this May to hear out his colleagues. Rather than start with the statement of Bayes' Theorem, I want to use an old math teacher trick (which I realize many students hate) of trying to derive it from scratch, without stating what we're trying to derive. Rather, we'll start by modifying a problem that I described in an earlier post on probability distributions1. Bayes' gives you a way of determining the probability that a given event will occur, or that a given condition is true, given your knowledge of another related event or condition. All the examples that I've read or heard about seemed somewhat contrived and unrelated to the sorts of data analysis I was interested in.


Deep Learning in Neural Networks: An Overview

#artificialintelligence

What a wonderful treasure trove this paper is! Schmidhuber provides all the background you need to gain an overview of deep learning (as of 2014) and how we got there through the preceding decades. Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and beyond. The main part of the paper runs to 35 pages, and then there are 53 pages of references. Now, I know that many of you think I read a lot of papers – just over 200 a year on this blog – but if I did nothing but review these key works in the development of deep learning it would take me about 4.5 years to get through them at that rate! And when I'd finished I'd still be about 6 years behind the then current state of the art!


Elements of machine learning

@machinelearnbot

The official title of this free book available in PDF format is Machine Learning Cheat Sheet. See table of content screenshot below. The chapters 17 to 28 (the most interesting ones in my opinion) seem like a work in progress - I'm sure the authors intend to make them a bit bigger. For a more modern and applied book, get Dr Granville's book on data science.