Comparative Gene Prediction using Conditional Random Fields

Vinson, Jade P., Decaprio, David, Pearson, Matthew D., Luoma, Stacey, Galagan, James E.

Neural Information Processing Systems 

Computational gene prediction using generative models has reached a plateau, with several groups converging to a generalized hidden Markov model (GHMM) incorporating phylogenetic models of nucleotide sequence evolution. Further improvements ingene calling accuracy are likely to come through new methods that incorporate additional data, both comparative and species specific. Conditional Random Fields (CRFs), which directly model the conditional probability P (y x) of a vector of hidden states conditioned on a set of observations, provide a unified frameworkfor combining probabilistic and non-probabilistic information and have been shown to outperform HMMs on sequence labeling tasks in natural language processing. We describe the use of CRFs for comparative gene prediction. We implement a model that encapsulates both a phylogenetic-GHMM (our baseline comparative model) and additional non-probabilistic features. We tested our model on the genome sequence of the fungal human pathogen Cryptococcus neoformans.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found