Genre
Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks
Keyvanpour, Mohammadreza, Azizani, Fereshteh
Understanding the structure and dynamics of biological networks is one of the important challenges in system biology. In addition, increasing amount of experimental data in biological networks necessitate the use of efficient methods to analyze these huge amounts of data. Such methods require to recognize common patterns to analyze data. As biological networks can be modeled by graphs, the problem of common patterns recognition is equivalent with frequent sub graph mining in a set of graphs. In this paper, at first the challenges of frequent subgrpahs mining in biological networks are introduced and the existing approaches are classified for each challenge.
MahNMF: Manhattan Non-negative Matrix Factorization
Guan, Naiyang, Tao, Dacheng, Luo, Zhigang, Shawe-Taylor, John
Non-negative matrix factorization (NMF) approximates a non-negative matrix $X$ by a product of two non-negative low-rank factor matrices $W$ and $H$. NMF and its extensions minimize either the Kullback-Leibler divergence or the Euclidean distance between $X$ and $W^T H$ to model the Poisson noise or the Gaussian noise. In practice, when the noise distribution is heavy tailed, they cannot perform well. This paper presents Manhattan NMF (MahNMF) which minimizes the Manhattan distance between $X$ and $W^T H$ for modeling the heavy tailed Laplacian noise. Similar to sparse and low-rank matrix decompositions, MahNMF robustly estimates the low-rank part and the sparse part of a non-negative matrix and thus performs effectively when data are contaminated by outliers. We extend MahNMF for various practical applications by developing box-constrained MahNMF, manifold regularized MahNMF, group sparse MahNMF, elastic net inducing MahNMF, and symmetric MahNMF. The major contribution of this paper lies in two fast optimization algorithms for MahNMF and its extensions: the rank-one residual iteration (RRI) method and Nesterov's smoothing method. In particular, by approximating the residual matrix by the outer product of one row of W and one row of $H$ in MahNMF, we develop an RRI method to iteratively update each variable of $W$ and $H$ in a closed form solution. Although RRI is efficient for small scale MahNMF and some of its extensions, it is neither scalable to large scale matrices nor flexible enough to optimize all MahNMF extensions. Since the objective functions of MahNMF and its extensions are neither convex nor smooth, we apply Nesterov's smoothing method to recursively optimize one factor matrix with another matrix fixed. By setting the smoothing parameter inversely proportional to the iteration number, we improve the approximation accuracy iteratively for both MahNMF and its extensions.
Isabelle/jEdit --- a Prover IDE within the PIDE framework
PIDE is a general framework for document-oriented prover interaction and integration, based on a bilingual architecture that combines ML and Scala. The overall aim is to connect LCF-style provers like Isabelle (or Coq or HOL) with sophisticated front-end technology on the JVM platform, overcoming command-line interaction at last. The present system description specifically covers Isabelle/jEdit as part of the official release of Isabelle2011-1 (October 2011). It is a concrete Prover IDE implementation based on Isabelle/PIDE library modules (implemented in Scala) on the one hand, and the well-known text editor framework of jEdit (implemented in Java) on the other hand. The interaction model of our Prover IDE follows the idea of continuous proof checking: the theory source text is annotated by semantic information by the prover as it becomes available incrementally. This works via an asynchronous protocol that neither blocks the editor nor stops the prover from exploiting parallelism on multi-core hardware. The jEdit GUI provides standard metaphors for augmented text editing (highlighting, squiggles, tooltips, hyperlinks etc.) that we have instrumented to render the formal content from the prover context. Further refinement of the jEdit display engine via suitable plugins and fonts approximates mathematical rendering in the text buffer, including symbols from the TeX repertoire, and sub-/superscripts. Isabelle/jEdit is presented here both as a usable interface for current Isabelle, and as a reference application to inspire further projects based on PIDE.
An Approach to Model Interest for Planetary Rover through Dezert-Smarandache Theory
Ceriotti, Matteo, Vasile, Massimiliano, Giardini, Giovanni, Massari, Mauro
In this paper, we propose an approach for assigning an interest level to the goals of a planetary rover. Assigning an interest level to goals, allows the rover autonomously to transform and reallocate the goals. The interest level is defined by data-fusing payload and navigation information. The fusion yields an "interest map", that quantifies the level of interest of each area around the rover. In this way the planner can choose the most interesting scientific objectives to be analyzed, with limited human intervention, and reallocates its goals autonomously. The Dezert-Smarandache Theory of Plausible and Paradoxical Reasoning was used for information fusion: this theory allows dealing with vague and conflicting data. In particular, it allows us directly to model the behavior of the scientists that have to evaluate the relevance of a particular set of goals. The paper shows an application of the proposed approach to the generation of a reliable interest map.
Comparative Study for Inference of Hidden Classes in Stochastic Block Models
Zhang, Pan, Krzakala, Florent, Reichardt, Jรถrg, Zdeborovรก, Lenka
Inference of hidden classes in stochastic block model is a classical problem with important applications. Most commonly used methods for this problem involve na\"{\i}ve mean field approaches or heuristic spectral methods. Recently, belief propagation was proposed for this problem. In this contribution we perform a comparative study between the three methods on synthetically created networks. We show that belief propagation shows much better performance when compared to na\"{\i}ve mean field and spectral approaches. This applies to accuracy, computational efficiency and the tendency to overfit the data.
Verifying an algorithm computing Discrete Vector Fields for digital imaging
Heras, Jรณnathan, Poza, Marรญa, Rubio, Julio
In this paper, we present a formalization of an algorithm to construct admissible discrete vector fields in the Coq theorem prover taking advantage of the SSReflect library. Discrete vector fields are a tool which has been welcomed in the homological analysis of digital images since it provides a procedure to reduce the amount of information but preserving the homological properties. In particular, thanks to discrete vector fields, we are able to compute, inside Coq, homological properties of biomedical images which otherwise are out of the reach of this system.
Hypothesis Testing in Speckled Data with Stochastic Distances
Nascimento, Abraรฃo D. C., Cintra, Renato J., Frery, Alejandro C.
Images obtained with coherent illumination, as is the case of sonar, ultrasound-B, laser and Synthetic Aperture Radar -- SAR, are affected by speckle noise which reduces the ability to extract information from the data. Specialized techniques are required to deal with such imagery, which has been modeled by the G0 distribution and under which regions with different degrees of roughness and mean brightness can be characterized by two parameters; a third parameter, the number of looks, is related to the overall signal-to-noise ratio. Assessing distances between samples is an important step in image analysis; they provide grounds of the separability and, therefore, of the performance of classification procedures. This work derives and compares eight stochastic distances and assesses the performance of hypothesis tests that employ them and maximum likelihood estimation. We conclude that tests based on the triangular distance have the closest empirical size to the theoretical one, while those based on the arithmetic-geometric distances have the best power. Since the power of tests based on the triangular distance is close to optimum, we conclude that the safest choice is using this distance for hypothesis testing, even when compared with classical distances as Kullback-Leibler and Bhattacharyya.
Detecting Activations over Graphs using Spanning Tree Wavelet Bases
Sharpnack, James, Krishnamurthy, Akshay, Singh, Aarti
We consider the detection of activations over graphs under Gaussian noise, where signals are piece-wise constant over the graph. Despite the wide applicability of such a detection algorithm, there has been little success in the development of computationally feasible methods with proveable theoretical guarantees for general graph topologies. We cast this as a hypothesis testing problem, and first provide a universal necessary condition for asymptotic distinguishability of the null and alternative hypotheses. We then introduce the spanning tree wavelet basis over graphs, a localized basis that reflects the topology of the graph, and prove that for any spanning tree, this approach can distinguish null from alternative in a low signal-to-noise regime. Lastly, we improve on this result and show that using the uniform spanning tree in the basis construction yields a randomized test with stronger theoretical guarantees that in many cases matches our necessary conditions. Specifically, we obtain near-optimal performance in edge transitive graphs, $k$-nearest neighbor graphs, and $\epsilon$-graphs.
A Hierarchical Graphical Model for Record Linkage
Ravikumar, Pradeep, Cohen, William
The task of matching co-referent records is known among other names as rocord linkage. For large record-linkage problems, often there is little or no labeled data available, but unlabeled data shows a reasonable clear structure. For such problems, unsupervised or semi-supervised methods are preferable to supervised methods. In this paper, we describe a hierarchical graphical model framework for the linakge-problem in an unsupervised setting. In addition to proposing new methods, we also cast existing unsupervised probabilistic record-linkage methods in this framework. Some of the techniques we propose to minimize overfitting in the above model are of interest in the general graphical model setting. We describe a method for incorporating monotinicity constraints in a graphical model. We also outline a bootstrapping approach of using "single-field" classifiers to noisily label latent variables in a hierarchical model. Experimental results show that our proposed unsupervised methods perform quite competitively even with fully supervised record-linkage methods.
Biogeography-Based Informative Gene Selection and Cancer Classification Using SVM and Random Forests
Nikumbh, Sarvesh, Ghosh, Shameek, Jayaraman, Valadi
Microarray cancer gene expression data comprise of very high dimensions. Reducing the dimensions helps in improving the overall analysis and classification performance. We propose two hybrid techniques, Biogeography - based Optimization - Random Forests (BBO - RF) and BBO - SVM (Support Vector Machines) with gene ranking as a heuristic, for microarray gene expression analysis. This heuristic is obtained from information gain filter ranking procedure. The BBO algorithm generates a population of candidate subset of genes, as part of an ecosystem of habitats, and employs the migration and mutation processes across multiple generations of the population to improve the classification accuracy. The fitness of each gene subset is assessed by the classifiers - SVM and Random Forests. The performances of these hybrid techniques are evaluated on three cancer gene expression datasets retrieved from the Kent Ridge Biomedical datasets collection and the libSVM data repository. Our results demonstrate that genes selected by the proposed techniques yield classification accuracies comparable to previously reported algorithms.