Goto

Collaborating Authors

 Learning Graphical Models


Optimal Inference in Crowdsourced Classification via Belief Propagation

arXiv.org Machine Learning

Crowdsourcing systems are popular for solving large-scale labelling tasks with low-paid workers. We study the problem of recovering the true labels from the possibly erroneous crowdsourced labels under the popular Dawid-Skene model. To address this inference problem, several algorithms have recently been proposed, but the best known guarantee is still significantly larger than the fundamental limit. We close this gap by introducing a tighter lower bound on the fundamental limit and proving that Belief Propagation (BP) exactly matches this lower bound. The guaranteed optimality of BP is the strongest in the sense that it is information-theoretically impossible for any other algorithm to correctly label a larger fraction of the tasks. Experimental results suggest that BP is close to optimal for all regimes considered and improves upon competing state-of-the-art algorithms.


Modeling Grasp Motor Imagery through Deep Conditional Generative Models

arXiv.org Machine Learning

Grasping is a complex process involving knowledge of the object, the surroundings, and of oneself. While humans are able to integrate and process all of the sensory information required for performing this task, equipping machines with this capability is an extremely challenging endeavor. In this paper, we investigate how deep learning techniques can allow us to translate high-level concepts such as motor imagery to the problem of robotic grasp synthesis. We explore a paradigm based on generative models for learning integrated object-action representations, and demonstrate its capacity for capturing and generating multimodal, multi-finger grasp configurations on a simulated grasping dataset.


Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

arXiv.org Machine Learning

Convolutional Neural Networks (CNNs) are effective models for reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR). Hybrid speech recognition systems incorporating CNNs with Hidden Markov Models/Gaussian Mixture Models (HMMs/GMMs) have achieved the state-of-the-art in various benchmarks. Meanwhile, Connectionist Temporal Classification (CTC) with Recurrent Neural Networks (RNNs), which is proposed for labeling unsegmented sequences, makes it feasible to train an'end-to-end' speech recognition system instead of hybrid settings. However, RNNs are computationally expensive and sometimes difficult to train. In this paper, inspired by the advantages of both CNNs and the CTC approach, we propose an end-to-end speech framework for sequence labeling, by combining hierarchical CNNs with CTC directly without recurrent connections. By evaluating the approach on the TIMIT phoneme recognition task, we show that the proposed model is not only computationally efficient, but also competitive with the existing baseline systems. Moreover, we argue that CNNs have the capability to model temporal correlations with appropriate context information.


Choosing a Machine Learning Classifier

#artificialintelligence

How do you know what machine learning algorithm to choose for your classification problem? Of course, if you really care about accuracy, your best bet is to test out a couple different ones (making sure to try different parameters within each algorithm as well), and select the best one by cross-validation. But if you're simply looking for a "good enough" algorithm for your problem, or a place to start, here are some general guidelines I've found to work well over the years. If your training set is small, high bias/low variance classifiers (e.g., Naive Bayes) have an advantage over low bias/high variance classifiers (e.g., kNN), since the latter will overfit. But low bias/high variance classifiers start to win out as your training set grows (they have lower asymptotic error), since high bias classifiers aren't powerful enough to provide accurate models.


Artificial intelligence

#artificialintelligence

Major AI researchers and textbooks define the field as "the study and design of intelligent agents", where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success. John McCarthy, who coined the term in 1955, defines it as "The science and engineering of making intelligent machines". AI research is highly technical and specialized, deeply divided into subfields that often fail to communicate with each other. Some of the division is due to social and cultural factors: subfields have grown up around particular institutions and the work of individual researchers. AI research is also divided by several technical issues.


On Reject and Refine Options in Multicategory Classification

arXiv.org Machine Learning

In many real applications of statistical learning, a decision made from misclassification can be too costly to afford; in this case, a reject option, which defers the decision until further investigation is conducted, is often preferred. In recent years, there has been much development for binary classification with a reject option. Yet, little progress has been made for the multicategory case. In this article, we propose margin-based multicategory classification methods with a reject option. In addition, and more importantly, we introduce a new and unique refine option for the multicategory problem, where the class of an observation is predicted to be from a set of class labels, whose cardinality is not necessarily one. The main advantage of both options lies in their capacity of identifying error-prone observations. Moreover, the refine option can provide more constructive information for classification by effectively ruling out implausible classes. Efficient implementations have been developed for the proposed methods. On the theoretical side, we offer a novel statistical learning theory and show a fast convergence rate of the excess $\ell$-risk of our methods with emphasis on diverging dimensionality and number of classes. The results can be further improved under a low noise assumption. A set of comprehensive simulation and real data studies has shown the usefulness of the new learning tools compared to regular multicategory classifiers. Detailed proofs of theorems and extended numerical results are included in the supplemental materials available online.


Learning Sparse Structural Changes in High-dimensional Markov Networks: A Review on Methodologies and Theories

arXiv.org Machine Learning

For example, genes may regulate each other in different ways when external conditions are changed; the number of daily flu-like symptom reports in nearby hospitals may become correlated when a major epidemic disease breaks out; EEG signals from different regions of the brain may be synchronized/desynchronized when the subject is performing different activities. Spotting such changes in interactions may provide key insights into the underlying system. The interactions among random variables can be formulated as undirected probabilistic graphical models, or Markov Networks (MNs) [Koller and Friedman, 2009], expressing the interactions via the conditional independence. We consider a simple model: the pairwise MNs where the links are only encoded for single or pairs of random variables. Due to the Hammersley-Clifford theorem [Hammersley and Clifford, 1971], the underlying joint probability density function can be represented as the product of univariate and bivariate factors.


Bayesian model selection consistency and oracle inequality with intractable marginal likelihood

arXiv.org Machine Learning

In this article, we investigate large sample properties of model selection procedures in a general Bayesian framework when a closed form expression of the marginal likelihood function is not available or a local asymptotic quadratic approximation of the log-likelihood function does not exist. Under appropriate identifiability assumptions on the true model, we provide sufficient conditions for a Bayesian model selection procedure to be consistent and obey the Occam's razor phenomenon, i.e., the probability of selecting the "smallest" model that contains the truth tends to one as the sample size goes to infinity. In order to show that a Bayesian model selection procedure selects the smallest model containing the truth, we impose a prior anti-concentration condition, requiring the prior mass assigned by large models to a neighborhood of the truth to be sufficiently small. In a more general setting where the strong model identifiability assumption may not hold, we introduce the notion of local Bayesian complexity and develop oracle inequalities for Bayesian model selection procedures. Our Bayesian oracle inequality characterizes a trade-off between the approximation error and a Bayesian characterization of the local complexity of the model, illustrating the adaptive nature of averaging-based Bayesian procedures towards achieving an optimal rate of posterior convergence. Specific applications of the model selection theory are discussed in the context of high-dimensional nonparametric regression and density regression where the regression function or the conditional density is assumed to depend on a fixed subset of predictors. As a result of independent interest, we propose a general technique for obtaining upper bounds of certain small ball probability of stationary Gaussian processes.


Information Pursuit: A Bayesian Framework for Sequential Scene Parsing

arXiv.org Machine Learning

Despite enormous progress in object detection and classification, the problem of incorporating expected contextual relationships among object instances into modern recognition systems remains a key challenge. In this work we propose Information Pursuit, a Bayesian framework for scene parsing that combines prior models for the geometry of the scene and the spatial arrangement of objects instances with a data model for the output of high-level image classifiers trained to answer specific questions about the scene. In the proposed framework, the scene interpretation is progressively refined as evidence accumulates from the answers to a sequence of questions. At each step, we choose the question to maximize the mutual information between the new answer and the full interpretation given the current evidence obtained from previous inquiries. We also propose a method for learning the parameters of the model from synthesized, annotated scenes obtained by top-down sampling from an easy-to-learn generative scene model. Finally, we introduce a database of annotated indoor scenes of dining room tables, which we use to evaluate the proposed approach.


Stratification of patient trajectories using covariate latent variable models

arXiv.org Machine Learning

Standard models assign disease progression to discrete categories or stages based on well-characterized clinical markers. However, such a system is potentially at odds with our understanding of the underlying biology, which in highly complex systems may support a (near-)continuous evolution of disease from inception to terminal state. To learn such a continuous disease score one could infer a latent variable from dynamic "omics" data such as RNA-seq that correlates with an outcome of interest such as survival time. However, such analyses may be confounded by additional data such as clinical covariates measured in electronic health records (EHRs). As a solution to this we introduce covariate latent variable models, a novel type of latent variable model that learns a low-dimensional data representation in the presence of two (asymmetric) views of the same data source. We apply our model to TCGA colorectal cancer RNA-seq data and demonstrate how incorporating microsatellite-instability (MSI) status as an external covariate allows us to identify genes that stratify patients on an immune-response trajectory. Finally, we propose an extension termed Covariate Gaussian Process Latent Variable Models for learning nonparametric, nonlinear representations. An R package implementing variational inference for covariate latent variable models is available at http://github.com/kieranrcampbell/clvm.