Goto

Collaborating Authors

 Learning Graphical Models


A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model

arXiv.org Machine Learning

We present a sparse knowledge gradient (SpKG) algorithm for adaptively selecting the targeted regions within a large RNA molecule to identify which regions are most amenable to interactions with other molecules. Experimentally, such regions can be inferred from fluorescence measurements obtained by binding a complementary probe with fluorescence markers to the targeted regions. We use a biophysical model which shows that the fluorescence ratio under the log scale has a sparse linear relationship with the coefficients describing the accessibility of each nucleotide, since not all sites are accessible (due to the folding of the molecule). The SpKG algorithm uniquely combines the Bayesian ranking and selection problem with the frequentist $\ell_1$ regularized regression approach Lasso. We use this algorithm to identify the sparsity pattern of the linear model as well as sequentially decide the best regions to test before experimental budget is exhausted. Besides, we also develop two other new algorithms: batch SpKG algorithm, which generates more suggestions sequentially to run parallel experiments; and batch SpKG with a procedure which we call length mutagenesis. It dynamically adds in new alternatives, in the form of types of probes, are created by inserting, deleting or mutating nucleotides within existing probes. In simulation, we demonstrate these algorithms on the Group I intron (a mid-size RNA molecule), showing that they efficiently learn the correct sparsity pattern, identify the most accessible region, and outperform several other policies.


Universal Approximation of Edge Density in Large Graphs

arXiv.org Machine Learning

With the recent availability of much network data, such as world wide web, social networks, phone call networks, science collaboration graphs [1], [2], there is a renewed interest for the graph partitioning problem, especially for the automatic discovery of community structures in large networks [3], [4], [5]. Beyond clustering approaches, coclustering approaches aim at summarizing the relation between two entities in a many-to-many relationship. Such a relation can be represented as a graph, where the source and target vertices represent entities and the edges stand for relations between entities. A coclustering model provides a summary of a graph by grouping source vertices and target vertices. For example, in market analysis, the source vertices of the graph represent customers, the target vertices represent products and there is one edge each time a customer has purchased a product. A coclustering model summarizes the dataset by grouping customers that have purchased approximately the same products and grouping products that have been purchased by approximately the same customers. Coclustering models have been applied to many other domains, such as information retrieval (the entities are documents and their words in a text corpus), web log analysis (cookies and their visited web pages), web structure analysis (web pages with hyperlinks between them) or telecommunication network (the call detail records stand for the edges in a call graph between a caller and a called party). All these real-world graphs are directed multigraphs, meaning that two entities may be linked by multi-edges. We aim to summarize and discover insightful patterns in such graphs, using a method with the desired following properties: 1) Robustness, to avoid detecting spurious patterns in case of noisy data.


A Gauss-Newton Method for Markov Decision Processes

arXiv.org Machine Learning

Approximate Newton methods are a standard optimization tool which aim to maintain the benefits of Newton's method, such as a fast rate of convergence, whilst alleviating its drawbacks, such as computationally expensive calculation or estimation of the inverse Hessian. In this work we investigate approximate Newton methods for policy optimization in Markov Decision Processes (MDPs). We first analyse the structure of the Hessian of the objective function for MDPs. We show that, like the gradient, the Hessian exhibits useful structure in the context of MDPs and we use this analysis to motivate two Gauss-Newton Methods for MDPs. Like the Gauss-Newton method for non-linear least squares, these methods involve approximating the Hessian by ignoring certain terms in the Hessian which are difficult to estimate. The approximate Hessians possess desirable properties, such as negative definiteness, and we demonstrate several important performance guarantees including guaranteed ascent directions, invariance to affine transformation of the parameter space, and convergence guarantees. We finally provide a unifying perspective of key policy search algorithms, demonstrating that our second Gauss-Newton algorithm is closely related to both the EM-algorithm and natural gradient ascent applied to MDPs, but performs significantly better in practice on a range of challenging domains.


A MAP approach for $\ell_q$-norm regularized sparse parameter estimation using the EM algorithm

arXiv.org Machine Learning

In this paper, Bayesian parameter estimation through the consideration of the Maximum A Posteriori (MAP) criterion is revisited under the prism of the Expectation-Maximization (EM) algorithm. By incorporating a sparsity-promoting penalty term in the cost function of the estimation problem through the use of an appropriate prior distribution, we show how the EM algorithm can be used to efficiently solve the corresponding optimization problem. To this end, we rely on variance-mean Gaussian mixtures (VMGM) to describe the prior distribution, while we incorporate many nice features of these mixtures to our estimation problem. The corresponding MAP estimation problem is completely expressed in terms of the EM algorithm, which allows for handling nonlinearities and hidden variables that cannot be easily handled with traditional methods. For comparison purposes, we also develop a Coordinate Descent algorithm for the $\ell_q$-norm penalized problem and present the performance results via simulations.


Progressive EM for Latent Tree Models and Hierarchical Topic Detection

arXiv.org Machine Learning

Hierarchical latent tree analysis (HLTA) is recently proposed as a new method for topic detection. It differs fundamentally from the LDA-based methods in terms of topic definition, topic-document relationship, and learning method. It has been shown to discover significantly more coherent topics and better topic hierarchies. However, HLTA relies on the Expectation-Maximization (EM) algorithm for parameter estimation and hence is not efficient enough to deal with large datasets. In this paper, we propose a method to drastically speed up HLTA using a technique inspired by recent advances in the moments method. Empirical experiments show that our method greatly improves the efficiency of HLTA. It is as efficient as the state-of-the-art LDA-based method for hierarchical topic detection and finds substantially better topics and topic hierarchies.


Unsupervised Learning in Genome Informatics

arXiv.org Machine Learning

With different genomes available, unsupervised learning algorithms are essential in learning genome-wide biological insights. Especially, the functional characterization of different genomes is essential for us to understand lives. In this book chapter, we review the state-of-the-art unsupervised learning algorithms for genome informatics from DNA to MicroRNA. DNA (DeoxyriboNucleic Acid) is the basic component of genomes. A significant fraction of DNA regions (transcription factor binding sites) are bound by proteins (transcription factors) to regulate gene expression at different development stages in different tissues. To fully understand genetics, it is necessary of us to apply unsupervised learning algorithms to learn and infer those DNA regions. Here we review several unsupervised learning methods for deciphering the genome-wide patterns of those DNA regions. MicroRNA (miRNA), a class of small endogenous non-coding RNA (RiboNucleic acid) species, regulate gene expression post-transcriptionally by forming imperfect base-pair with the target sites primarily at the 3$'$ untranslated regions of the messenger RNAs. Since the 1993 discovery of the first miRNA \emph{let-7} in worms, a vast amount of studies have been dedicated to functionally characterizing the functional impacts of miRNA in a network context to understand complex diseases such as cancer. Here we review several representative unsupervised learning frameworks on inferring miRNA regulatory network by exploiting the static sequence-based information pertinent to the prior knowledge of miRNA targeting and the dynamic information of miRNA activities implicated by the recently available large data compendia, which interrogate genome-wide expression profiles of miRNAs and/or mRNAs across various cell conditions.


Tag-Weighted Topic Model For Large-scale Semi-Structured Documents

arXiv.org Machine Learning

To date, there have been massive Semi-Structured Documents (SSDs) during the evolution of the Internet. These SSDs contain both unstructured features (e.g., plain text) and metadata (e.g., tags). Most previous works focused on modeling the unstructured text, and recently, some other methods have been proposed to model the unstructured text with specific tags. To build a general model for SSDs remains an important problem in terms of both model fitness and efficiency. We propose a novel method to model the SSDs by a so-called Tag-Weighted Topic Model (TWTM). TWTM is a framework that leverages both the tags and words information, not only to learn the document-topic and topic-word distributions, but also to infer the tag-topic distributions for text mining tasks. We present an efficient variational inference method with an EM algorithm for estimating the model parameters. Meanwhile, we propose three large-scale solutions for our model under the MapReduce distributed computing platform for modeling large-scale SSDs. The experimental results show the effectiveness, efficiency and the robustness by comparing our model with the state-of-the-art methods in document modeling, tags prediction and text classification. We also show the performance of the three distributed solutions in terms of time and accuracy on document modeling.


Context-aware learning for finite mixture models

arXiv.org Machine Learning

This work introduces algorithms able to exploit contextual information in order to improve maximum-likelihood (ML) parameter estimation in finite mixture models (FMM), demonstrating their benefits and properties in several scenarios. The proposed algorithms are derived in a probabilistic framework with regard to situations where the regular FMM graphs can be extended with context-related variables, respecting the standard expectation-maximization (EM) methodology and, thus, rendering explicit supervision completely redundant. We show that, by direct application of the missing information principle, the compared algorithms' learning behaviour operates between the extremities of supervised and unsupervised learning, proportionally to the information content of contextual assistance. Our simulation results demonstrate the superiority of context-aware FMM training as compared to conventional unsupervised training in terms of estimation precision, standard errors, convergence rates and classification accuracy or regression fitness in various scenarios, while also highlighting important differences among the outlined situations. Finally, the improved classification outcome of contextually enhanced FMMs is showcased in a brain-computer interface application scenario.


An Analytically Tractable Bayesian Approximation to Optimal Point Process Filtering

arXiv.org Machine Learning

The process of dynamic state estimation (filtering) based on point process observations is in general intractable. Numerical sampling techniques are often practically useful, but lead to limited conceptual insight about optimal encoding/decoding strategies, which are of significant relevance to Computational Neuroscience. We develop an analytically tractable Bayesian approximation to optimal filtering based on point process observations, which allows us to introduce distributional assumptions about sensory cell properties, that greatly facilitates the analysis of optimal encoding in situations deviating from common assumptions of uniform coding. The analytic framework leads to insights which are difficult to obtain from numerical algorithms, and is consistent with experiments about the distribution of tuning curve centers. Interestingly, we find that the information gained from the absence of spikes may be crucial to performance.


Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm

arXiv.org Machine Learning

We propose an extensive analysis of the behavior of majority votes in binary classification. In particular, we introduce a risk bound for majority votes, called the C-bound, that takes into account the average quality of the voters and their average disagreement. We also propose an extensive PAC-Bayesian analysis that shows how the C-bound can be estimated from various observations contained in the training data. The analysis intends to be self-contained and can be used as introductory material to PAC-Bayesian statistical learning theory. It starts from a general PAC-Bayesian perspective and ends with uncommon PAC-Bayesian bounds. Some of these bounds contain no Kullback-Leibler divergence and others allow kernel functions to be used as voters (via the sample compression setting). Finally, out of the analysis, we propose the MinCq learning algorithm that basically minimizes the C-bound. MinCq reduces to a simple quadratic program. Aside from being theoretically grounded, MinCq achieves state-of-the-art performance, as shown in our extensive empirical comparison with both AdaBoost and the Support Vector Machine.