Goto

Collaborating Authors

Application of three graph Laplacian based semi-supervised learning methods to protein function prediction problem

arXiv.org Machine Learning

Protein function prediction is the important problem in modern biology. In this paper, the un-normalized, symmetric normalized, and random walk graph Laplacian based semi-supervised learning methods will be applied to the integrated network combined from multiple networks to predict the functions of all yeast proteins in these multiple networks. These multiple networks are network created from Pfam domain structure, co-participation in a protein complex, protein-protein interaction network, genetic interaction network, and network created from cell cycle gene expression measurements. Multiple networks are combined with fixed weights instead of using convex optimization to determine the combination weights due to high time complexity of convex optimization method. This simple combination method will not affect the accuracy performance measures of the three semi-supervised learning methods. Experiment results show that the un-normalized and symmetric normalized graph Laplacian based methods perform slightly better than random walk graph Laplacian based method for integrated network. Moreover, the accuracy performance measures of these three semi-supervised learning methods for integrated network are much better than the best accuracy performance measures of these three methods for the individual network.


Hypergraph and protein function prediction with gene expression data

arXiv.org Machine Learning

Most network-based protein (or gene) function prediction methods are based on the assumption that the labels of two adjacent proteins in the network are likely to be the same. However, assuming the pairwise relationship between proteins or genes is not complete, the information a group of genes that show very similar patterns of expression and tend to have similar functions (i.e. the functional modules) is missed. The natural way overcoming the information loss of the above assumption is to represent the gene expression data as the hypergraph. Thus, in this paper, the three un-normalized, random walk, and symmetric normalized hypergraph Laplacian based semi-supervised learning methods applied to hypergraph constructed from the gene expression data in order to predict the functions of yeast proteins are introduced. Experiment results show that the average accuracy performance measures of these three hypergraph Laplacian based semi-supervised learning methods are the same. However, their average accuracy performance measures of these three methods are much greater than the average accuracy performance measures of un-normalized graph Laplacian based semi-supervised learning method (i.e. the baseline method of this paper) applied to gene co-expression network created from the gene expression data.


Un-normalized hypergraph p-Laplacian based semi-supervised learning methods

arXiv.org Machine Learning

Most network-based machine learning methods assume that the labels of two adjacent samples in the network are likely to be the same. However, assuming the pairwise relationship between samples is not complete. The information a group of samples that shows very similar pattern and tends to have similar labels is missed. The natural way overcoming the information loss of the above assumption is to represent the feature dataset of samples as the hypergraph. Thus, in this paper, we will present the un-normalized hypergraph p-Laplacian semi-supervised learning methods. These methods will be applied to the zoo dataset and the tiny version of 20 newsgroups dataset. Experiment results show that the accuracy performance measures of these un-normalized hypergraph p-Laplacian based semi-supervised learning methods are significantly greater than the accuracy performance measure of the un-normalized hypergraph Laplacian based semi-supervised learning method (the current state of the art method hypergraph Laplacian based semi-supervised learning method for classification problem with p=2).


Structured Graph Learning Via Laplacian Spectral Constraints

Neural Information Processing Systems

Learning a graph with a specific structure is essential for interpretability and identification of the relationships among data. But structured graph learning from observed samples is an NP-hard combinatorial problem. In this paper, we first show, for a set of important graph families it is possible to convert the combinatorial constraints of structure into eigenvalue constraints of the graph Laplacian matrix. Then we introduce a unified graph learning framework lying at the integration of the spectral properties of the Laplacian matrix with Gaussian graphical modeling, which is capable of learning structures of a large class of graph families. The proposed algorithms are provably convergent and practically amenable for big-data specific tasks.


Clustering Signed Networks with the Geometric Mean of Laplacians

Neural Information Processing Systems

Signed networks allow to model positive and negative relationships. We analyze existing extensions of spectral clustering to signed networks. It turns out that existing approaches do not recover the ground truth clustering in several situations where either the positive or the negative network structures contain no noise. Our analysis shows that these problems arise as existing approaches take some form of arithmetic mean of the Laplacians of the positive and negative part. As a solution we propose to use the geometric mean of the Laplacians of positive and negative part and show that it outperforms the existing approaches.