Technology
Probabilistic Inference of Alternative Splicing Events in Microarray Data
Shai, Ofer, Frey, Brendan J., Morris, Quaid D., Pan, Qun, Misquitta, Christine, Blencowe, Benjamin J.
Alternative splicing (AS) is an important and frequent step in mammalian gene expression that allows a single gene to specify multiple products, and is crucial for the regulation of fundamental biological processes. The extent of AS regulation, and the mechanisms involved, are not well understood. We have developed a custom DNA microarray platform for surveying AS levels on a large scale. We present here a generative model for the AS Array Platform (GenASAP) and demonstrate its utility for quantifying AS levels in different mouse tissues. Learning is performed using a variational expectation maximization algorithm, and the parameters are shown to correctly capture expected AS trends. A comparison of the results obtained with a well-established but low throughput experimental method demonstrate that AS levels obtained from GenASAP are highly predictive of AS levels in mammalian tissues.
Real-Time Pitch Determination of One or More Voices by Nonnegative Matrix Factorization
An auditory "scene", composed of overlapping acoustic sources, can be viewed as a complex object whose constituent parts are the individual sources. Pitch is known to be an important cue for auditory scene analysis. In this paper, with the goal of building agents that operate in human environments, we describe a real-time system to identify the presence of one or more voices and compute their pitch. The signal processing in the front end is based on instantaneous frequency estimation, a method for tracking the partials of voiced speech, while the pattern-matching in the back end is based on nonnegative matrix factorization, an unsupervised algorithm for learning the parts of complex objects. While supporting a framework to analyze complicated auditory scenes, our system maintains real-time operability and state-of-the-art performance in clean speech.
On the Adaptive Properties of Decision Trees
Decision trees are surprisingly adaptive in three important respects: They automatically (1) adapt to favorable conditions near the Bayes decision boundary; (2) focus on data distributed on lower dimensional manifolds; (3) reject irrelevant features. In this paper we examine a decision tree based on dyadic splits that adapts to each of these conditions to achieve minimax optimal rates of convergence. The proposed classifier is the first known to achieve these optimal rates while being practical and implementable.
Assignment of Multiplicative Mixtures in Natural Images
Schwartz, Odelia, Sejnowski, Terrence J., Dayan, Peter
In the analysis of natural images, Gaussian scale mixtures (GSM) have been used to account for the statistics of filter responses, and to inspire hierarchical cortical representational learning schemes. GSMs pose a critical assignment problem, working out which filter responses were generated by a common multiplicative factor. We present a new approach to solving this assignment problem through a probabilistic extension to the basic GSM, and show how to perform inference in the model using Gibbs sampling. We demonstrate the efficacy of the approach on both synthetic and image data. Understanding the statistical structure of natural images is an important goal for visual neuroscience. Neural representations in early cortical areas decompose images (and likely other sensory inputs) in a way that is sensitive to sophisticated aspects of their probabilistic structure. This structure also plays a key role in methods for image processing and coding. A striking aspect of natural images that has reflections in both top-down and bottom-up modeling is coordination across nearby locations, scales, and orientations. From a topdown perspective, this structure has been modeled using what is known as a Gaussian Scale Mixture model (GSM).
Learning Gaussian Process Kernels via Hierarchical Bayes
Schwaighofer, Anton, Tresp, Volker, Yu, Kai
We present a novel method for learning with Gaussian process regression in a hierarchical Bayesian framework. In a first step, kernel matrices on a fixed set of input points are learned from data using a simple and efficient EM algorithm. This step is nonparametric, in that it does not require a parametric form of covariance function. In a second step, kernel functions are fitted to approximate the learned covariance matrix using a generalized Nystr om method, which results in a complex, data driven kernel. We evaluate our approach as a recommendation engine for art images, where the proposed hierarchical Bayesian method leads to excellent prediction performance.
Edge of Chaos Computation in Mixed-Mode VLSI - A Hard Liquid
Schรผrmann, Felix, Meier, Karlheinz, Schemmel, Johannes
Computation without stable states is a computing paradigm different from Turing's and has been demonstrated for various types of simulated neural networks. This publication transfers this to a hardware implemented neural network. Results of a software implementation are reproduced showing that the performance peaks when the network exhibits dynamics at the edge of chaos. The liquid computing approach seems well suited for operating analog computing devices such as the used VLSI neural network.
Semi-parametric Exponential Family PCA
Sajama, Sajama, Orlitsky, Alon
We present a semi-parametric latent variable model based technique for density modelling, dimensionality reduction and visualization. Unlike previous methods, we estimate the latent distribution non-parametrically which enables us to model data generated by an underlying low dimensional, multimodal distribution. In addition, we allow the components of latent variable models to be drawn from the exponential family which makes the method suitable for special data types, for example binary or count data. Simulations on real valued, binary and count data show favorable comparison to other related schemes both in terms of separating different populations and generalization to unseen samples.
A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
Rosset, Saharon, Zhu, Ji, Zou, Hui, Hastie, Trevor J.
We consider the situation in semi-supervised learning, where the "label sampling" mechanism stochastically depends on the true response (as well as potentially on the features). We suggest a method of moments for estimating this stochastic dependence using the unlabeled data. This is potentially useful for two distinct purposes: a. As an input to a supervised learning procedure which can be used to "de-bias" its results using labeled data only and b.