Gene Expression Time Course Clustering with Countably Infinite Hidden Markov Models

Beal, Matthew, Krishnamurthy, Praveen

arXiv.org Machine Learning 

It is said that genes that cluster with similar expression-- that is, are co-expressed--serve similar functional roles in a process (see, for example, Eisen et al. 1998). Bioin-formaticians have more recently had access to sets of time-series measurements of genes' expression over the duration of an experiment, and have desired therefore to learn not just co-expression, but causal relationships that may help elucidate co-regulation as well. Two problematic issues hamper practical methods for clustering gene expression time course data: first, if deriving a model-based clustering metric, it is often unclear what the appropriate model complexity should be; second, the current clustering algorithms available cannot handle, and therefore disregard, the temporal information. This usually occurs when constructing a metric for the distance between any two such genes. The common practice for an experiment having T measurements of a gene's expression over time is to consider the expression as positioned in a T -dimensional space, and to perform (at worse spherical metric) clustering in that space. The result is that the clustering algorithm is invariant to arbitrary permutations of the time points, which is highly undesirable since we would like to take into account the correlations between all the genes' expression at nearby or adjacent time points.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found