Comparative Gene Prediction using Conditional Random Fields
Vinson, Jade P., Decaprio, David, Pearson, Matthew D., Luoma, Stacey, Galagan, James E.
–Neural Information Processing Systems
Computational gene prediction using generative models has reached a plateau, with several groups converging to a generalized hidden Markov model (GHMM) incorporating phylogenetic models of nucleotide sequence evolution. Further improvements in gene calling accuracy are likely to come through new methods that incorporate additional data, both comparative and species specific. Conditional Random Fields (CRFs), which directly model the conditional probability P (y x) of a vector of hidden states conditioned on a set of observations, provide a unified framework for combining probabilistic and non-probabilistic information and have been shown to outperform HMMs on sequence labeling tasks in natural language processing. We describe the use of CRFs for comparative gene prediction. We implement a model that encapsulates both a phylogenetic-GHMM (our baseline comparative model) and additional non-probabilistic features. We tested our model on the genome sequence of the fungal human pathogen Cryptococcus neoformans.
Neural Information Processing Systems
Dec-31-2007
- Country:
- North America > United States
- Pennsylvania (0.04)
- Massachusetts
- Middlesex County > Cambridge (0.15)
- Hampshire County > Amherst (0.04)
- California
- San Francisco County > San Francisco (0.14)
- San Mateo County > Menlo Park (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Industry:
- Technology: