Mathematical & Statistical Methods
Graph Kernels via Functional Embedding
Shrivastava, Anshumali, Li, Ping
We propose a representation of graph as a functional object derived from the power iteration of the underlying adjacency matrix. The proposed functional representation is a graph invariant, i.e., the functional remains unchanged under any reordering of the vertices. This property eliminates the difficulty of handling exponentially many isomorphic forms. Bhattacharyya kernel constructed between these functionals significantly outperforms the state-of-the-art graph kernels on 3 out of the 4 standard benchmark graph classification datasets, demonstrating the superiority of our approach. The proposed methodology is simple and runs in time linear in the number of edges, which makes our kernel more efficient and scalable compared to many widely adopted graph kernels with running time cubic in the number of vertices.
A New Space for Comparing Graphs
Shrivastava, Anshumali, Li, Ping
Finding a new mathematical representations for graph, which allows direct comparison between different graph structures, is an open-ended research direction. Having such a representation is the first prerequisite for a variety of machine learning algorithms like classification, clustering, etc., over graph datasets. In this paper, we propose a symmetric positive semidefinite matrix with the $(i,j)$-{th} entry equal to the covariance between normalized vectors $A^ie$ and $A^je$ ($e$ being vector of all ones) as a representation for graph with adjacency matrix $A$. We show that the proposed matrix representation encodes the spectrum of the underlying adjacency matrix and it also contains information about the counts of small sub-structures present in the graph such as triangles and small paths. In addition, we show that this matrix is a \emph{"graph invariant"}. All these properties make the proposed matrix a suitable object for representing graphs. The representation, being a covariance matrix in a fixed dimensional metric space, gives a mathematical embedding for graphs. This naturally leads to a measure of similarity on graph objects. We define similarity between two given graphs as a Bhattacharya similarity measure between their corresponding covariance matrix representations. As shown in our experimental study on the task of social network classification, such a similarity measure outperforms other widely used state-of-the-art methodologies. Our proposed method is also computationally efficient. The computation of both the matrix representation and the similarity value can be performed in operations linear in the number of edges. This makes our method scalable in practice. We believe our theoretical and empirical results provide evidence for studying truncated power iterations, of the adjacency matrix, to characterize social networks.
A Tutorial on Independent Component Analysis
Independent component analysis (ICA) has become a standard data analysis technique applied to an array of problems in signal processing and machine learning. This tutorial provides an introduction to ICA based on linear algebra formulating an intuition for ICA from first principles. The goal of this tutorial is to provide a solid foundation on this advanced topic so that one might learn the motivation behind ICA, learn why and when to apply this technique and in the process gain an introduction to this exciting field of active research.
Stochastic processes and feedback-linearisation for online identification and Bayesian adaptive control of fully-actuated mechanical systems
Calliess, Jan-Peter, Papachristodoulou, Antonis, Roberts, Stephen J.
This work proposes a new method for simultaneous probabilistic identification and control of an observable, fully-actuated mechanical system. Identification is achieved by conditioning stochastic process priors on observations of configurations and noisy estimates of configuration derivatives. In contrast to previous work that has used stochastic processes for identification, we leverage the structural knowledge afforded by Lagrangian mechanics and learn the drift and control input matrix functions of the control-affine system separately. We utilise feedback-linearisation to reduce, in expectation, the uncertain nonlinear control problem to one that is easy to regulate in a desired manner. Thereby, our method combines the flexibility of nonparametric Bayesian learning with epistemological guarantees on the expected closed-loop trajectory. We illustrate our method in the context of torque-actuated pendula where the dynamics are learned with a combination of normal and log-normal processes.
Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence
Berglund, Mathias, Raiko, Tapani
Contrastive Divergence (CD) and Persistent Contrastive Divergence (PCD) are popular methods for training the weights of Restricted Boltzmann Machines. However, both methods use an approximate method for sampling from the model distribution. As a side effect, these approximations yield significantly different biases and variances for stochastic gradient estimates of individual data points. It is well known that CD yields a biased gradient estimate. In this paper we however show empirically that CD has a lower stochastic gradient estimate variance than exact sampling, while the mean of subsequent PCD estimates has a higher variance than exact sampling. The results give one explanation to the finding that CD can be used with smaller minibatches or higher learning rates than PCD.
An SIR Graph Growth Model for the Epidemics of Communicable Diseases
Dhanjal, Charanpal, Clรฉmenรงon, Stรฉphan
It is the main purpose of this paper to introduce a graph-valued stochastic process in order to model the spread of a communicable infectious disease. The major novelty of the SIR model we promote lies in the fact that the social network on which the epidemics is taking place is not specified in advance but evolves through time, accounting for the temporal evolution of the interactions involving infective individuals. Without assuming the existence of a fixed underlying network model, the stochastic process introduced describes, in a flexible and realistic manner, epidemic spread in non-uniformly mixing and possibly heterogeneous populations. It is shown how to fit such a (parametrised) model by means of Approximate Bayesian Computation methods based on graph-valued statistics. The concepts and statistical methods described in this paper are finally applied to a real epidemic dataset, related to the spread of HIV in Cuba in presence of a contact tracing system, which permits one to reconstruct partly the evolution of the graph of sexual partners diagnosed HIV positive between 1986 and 2006.
Distinguishing noise from chaos: objective versus subjective criteria using Horizontal Visibility Graph
Ravetti, Martรญn Gรณmez, Carpi, Laura C., Gonรงalves, Bruna Amin, Frery, Alejandro C., Rosso, Osvaldo A.
A recently proposed methodology called the Horizontal Visibility Graph (HVG) [Luque {\it et al.}, Phys. Rev. E., 80, 046103 (2009)] that constitutes a geometrical simplification of the well known Visibility Graph algorithm [Lacasa {\it et al.\/}, Proc. Natl. Sci. U.S.A. 105, 4972 (2008)], has been used to study the distinction between deterministic and stochastic components in time series [L. Lacasa and R. Toral, Phys. Rev. E., 82, 036120 (2010)]. Specifically, the authors propose that the node degree distribution of these processes follows an exponential functional of the form $P(\kappa)\sim \exp(-\lambda~\kappa)$, in which $\kappa$ is the node degree and $\lambda$ is a positive parameter able to distinguish between deterministic (chaotic) and stochastic (uncorrelated and correlated) dynamics. In this work, we investigate the characteristics of the node degree distributions constructed by using HVG, for time series corresponding to $28$ chaotic maps and $3$ different stochastic processes. We thoroughly study the methodology proposed by Lacasa and Toral finding several cases for which their hypothesis is not valid. We propose a methodology that uses the HVG together with Information Theory quantifiers. An extensive and careful analysis of the node degree distributions obtained by applying HVG allow us to conclude that the Fisher-Shannon information plane is a remarkable tool able to graphically represent the different nature, deterministic or stochastic, of the systems under study.
An Approximate, Efficient LP Solver for LP Rounding
Sridhar, Srikrishna, Wright, Stephen, Re, Christopher, Liu, Ji, Bittorf, Victor, Zhang, Ce
Many problems in machine learning can be solved by rounding the solution of an appropriate linear program (LP). This paper shows that we can recover solutions of comparable quality by rounding an approximate LP solution instead of the exact one.These approximate LP solutions can be computed efficiently by applying a parallel stochastic-coordinate-descent method to a quadratic-penalty formulation ofthe LP. We derive worst-case runtime and solution quality guarantees of this scheme using novel perturbation and convergence analysis. Our experiments demonstrate that on such combinatorial problems as vertex cover, independent set and multiway-cut, our approximate rounding scheme is up to an order of magnitude fasterthan Cplex (a commercial LP solver) while producing solutions of similar quality.
Stochastic blockmodel approximation of a graphon: Theory and consistent estimation
Airoldi, Edo M., Costa, Thiago B., Chan, Stanley H.
Given a convergent sequence of graphs, there exists a limit object called the graphon from which random graphs are generated. This nonparametric perspective of random graphs opens the door to study graphs beyond the traditional parametric models, but at the same time also poses the challenging question of how to estimate the graphon underlying observed graphs. In this paper, we propose a computationally efficient algorithm to estimate a graphon from a set of observed graphs generated from it. We show that, by approximating the graphon with stochastic block models, the graphon can be consistently estimated, that is, the estimation error vanishes as the size of the graph approaches infinity.
Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
Sharpnack, James L., Krishnamurthy, Akshay, Singh, Aarti
The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lov\'asz extended scan statistic (LESS) that uses submodularity to approximate the intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, $k$-nearest neighbor graphs, and $\epsilon$-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds.