Analysis of Gene Expression Data with Pathway Scores

AAAI Conferences

Large scale gene expression measurements can now be performed by several established techniques, including EST (expressed sequence tag) sequencing, clustering and counting, e.g.


Towards a complete map of the protein space based on a unified sequence and structure analysis of all known proteins

AAAI Conferences

These techniques have been carefully optimized for detecting remote homologies and were thoroughly studied. Almost all these techniques, which have been thoroughly tested, are based on pairwise comparisons of the query sequence with the sequences in one or more of the sequence databases.


Mining for putative regulatory elements in the yeast genome using gene expression data

AAAI Conferences

We have developed a set of methods and tools for automatic discovery of putative regulatory signals in genome sequences. The analysis pipeline consists of gene expression data clustering, sequence pattern discovery from upstream sequences of genes, a control experiment for pattern significance threshold limit detection, selection of interesting patterns, grouping of these patterns, representing the pattern groups in a concise form and evaluating the discovered putative signals against existing databases of regulatory signals. The pattern discovery is computationally the most expensive and crucial step. Our tool performs a rapid exhaustive search for apriori unknown statistically significant sequence patterns of unrestricted length. The statistical significance is determined for a set of sequences in each cluster with respect to a set of background sequences allowing the detection of subtle regulatory signals specific for each cluster.


A Multiple Alignment Algorithm for Metabolic Pathway Analysis using Enzyme Hierarchy

AAAI Conferences

Tel: 81-6-6850-6601 Fax: 81-6-6850-6602 Keywords: alignment, metabolic pathway, pathway analysis, enzyme, EC number Abstract In many of the chemical reactions in living cells, enzymes act as catalysts in the conversion of certain compounds (substrates) into other compounds (products). Comparative analyses the metabolic pathways formed by such reactions give important information on their evolution and on pharmacological targets (Dandekar et al. 1999). Each of the enzymes that constitute a pathway is classified according to the EC (Enzyme Commission) numbering system, which consists of four sets of numbers that categorize the type of the chemical reaction catalyzed. In this study, we consider that reaction similarities can be expressed by the similarities between EC numbers of the respective enzymes. Therefore, in order to find a common pattern among pathways, it is desirable to be able to use the functional hierarchy of EC numbers to express the reaction similarities.


Sequence Database Search Using Jumping Alignments

AAAI Conferences

Asterisks denote amino acids, dots denote existing gaps in the alignment, dashes denote gaps that are introduced by the jumping alignment procedure.


Linear Modeling of Genetic Networks from Experimental Data

AAAI Conferences

Tel: 31-152786424 Fax: 31-152781843 Keywords: Genetic Networks, Quasi-Linear Model, Clustering Abstract In this paper, the regulatory interactions between genes are modeled by a linear genetic network that is estimated from gene expression data. The inference of such a genetic network is hampered by the dimensionality problem. This problem is inherent in all gene expression data since the number of genes by far exceeds the number of measured time points. Consequently, there are infinitely many solutions that fit the data set perfectly. In this paper, this problem is tackled by combining genes with similar expression profiles in a single prototypical'gene'. Instead of modeling the genes individually, the relations between prototypical genes are modeled. In this way, genes that cannot be distinguished based on their expression profiles are grouped together and their common control action is modeled instead. This process reduces the number of signals and imposes a structure on the model that is supported by the fact that biological genetic networks are thought to be redundant and sparsely connected. In essence, the ambiguity in model solutions is represented explicitly by providing a generalized model that expresses the basic regulatory interactions between groups of similarly expressed genes. The modeling approach is illustrated on artificial as well as real data.


A Statistical Method for Finding Transcription Factor Binding Sites

AAAI Conferences

Let the random variable be the number of occurrences of the motif s in X, and let E(Xs) and (Xs) be its mean and standard deviation, respectively.


Alignment of Flexible Protein Structures

AAAI Conferences

Both apply efficient structural pattern detection and graph theoretic techniques. The FlexProt algorithm simultaneously detects the hinge regions and aligns the rigid subparts of the molecules. It does it by cfficlently detecting maximal congruent rigid fragments in both molecules and calculating their optimal arrangement which does not violate the protein sequence order. The FlexMol algorithm is sequence order independent, yet requires as inpu the hypothesized hinge positions. Due its sequence order independence it can also be applied to proteln-protein interface matching and drug molecule alignment.


Genes, Themes and Microarrays

AAAI Conferences

The development of DNA microarrays during the last few years (Schena et al. 1995; DeRisi, Iyer, & Brown 1997), allows researchers to simultaneously measure the expression levels of thousands of different genes. Experiments involving such arrays produce overwhelming amounts of data. In response, much recent work has been concerned with automating the analysis of microarray data.


CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis

AAAI Conferences

Novel DNA microarray technologies (Eisen Brown 1999) enable the monitoring of expression levels of thousands of genes simultaneously. This allows for the first time a global view on the transcription levels of many (or all) genes when the cell undergoes specific conditions or processes. The potential of such technologies for functional genomics is tremendous: Measuring gene expression levels in different developmental stages, different body tissues, different clinical conditions and different organisms is instrumental in understanding genes function, gene networks, biological processes and effects of medical treatments. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns over several conditions. The Copyright 2000, American Association for Artificial Intelligence corresponding algorithmic problem is to cluster multicondition gene expression patterns. The grouping of genes with similar expression patterns into clusters helps in unraveling relations between genes, deducing the function of genes and revealing the underlying gene regulatory network. A clustering problem consists of n elements and a characteristic vector for each element. In gene expression data, elements are genes, and the vector of each gene contains its expression levels under some conditions. These levels are obtained by measuring the intensity of hybridization of gene-specific oligonucleotides (or eDNA molecules), which are immobilized to a surface, to a labeled target RNA mixture (cf.