Data Mining for Regulatory Elements in Yeast Genome

AAAI Conferences

The first complete genomes have recently been sequenced and published, including the first eukaryotic * The results were obtained while the author was working at the Department of Computer Science, University of Helsinki.


An Exact Method for Finding Short Motifs in Sequences, with Application to the Ribosome Binding Site Problem

AAAI Conferences

This is an investigation of methods for finding short motifs that only occur in a fraction of the input sequences. Unlike local search techniques that may not reach a global optimum, the method proposed here is guaranteed to produce the motifs with greatest z-scores.


Clustering sequence sets for motif discovery

Neural Information Processing Systems

Most of existing methods for DNA motif discovery consider only a single set of sequences to find an over-represented motif. In contrast, we consider multiple sets of sequences where we group sets associated with the same motif into a cluster, assuming that each set involves a single motif. Clustering sets of sequences yields clusters of coherent motifs, improving signal-to-noise ratio or enabling us to identify multiple motifs. We present a probabilistic model for DNA motif discovery where we identify multiple motifs through searching for patterns which are shared across multiple sets of sequences. Our model infers cluster-indicating latent variables and learns motifs simultaneously, where these two tasks interact with each other. We show that our model can handle various motif discovery problems, depending on how to construct multiple sets of sequences. Experiments on three different problems for discovering DNA motifs emphasize the useful behavior and confirm the substantial gains over existing methods where only single set of sequences is considered.



A Statistical Method for Finding Transcription Factor Binding Sites

AAAI Conferences

Let the random variable be the number of occurrences of the motif s in X, and let E(Xs) and (Xs) be its mean and standard deviation, respectively.