Goto

Collaborating Authors

 Undirected Networks


Reconstructing undirected graphs from eigenspaces

arXiv.org Machine Learning

In this paper, we aim at recovering an undirected weighted graph of $N$ vertices from the knowledge of a perturbed version of the eigenspaces of its adjacency matrix $W$. For instance, this situation arises for stationary signals on graphs or for Markov chains observed at random times. Our approach is based on minimizing a cost function given by the Frobenius norm of the commutator $\mathsf{A} \mathsf{B}-\mathsf{B} \mathsf{A}$ between symmetric matrices $\mathsf{A}$ and $\mathsf{B}$. In the Erd\H{o}s-R\'enyi model with no self-loops, we show that identifiability (i.e., the ability to reconstruct $W$ from the knowledge of its eigenspaces) follows a sharp phase transition on the expected number of edges with threshold function $N\log N/2$. Given an estimation of the eigenspaces based on a $n$-sample, we provide support selection procedures from theoretical and practical point of views. In particular, when deleting an edge from the active support, our study unveils that our test statistic is the order of $\mathcal O(1/n)$ when we overestimate the true support and lower bounded by a positive constant when the estimated support is smaller than the true support. This feature leads to a powerful practical support estimation procedure. Simulated and real life numerical experiments assert our new methodology.


The Viterbi Algorithm Demystified - USC Viterbi School of Engineering

#artificialintelligence

Fifty years ago, I published a paper, "Error bounds for convolutional codes and an asymptotically optimum decoding algorithm," on the important class of convolutional codes, which is particularly effective in preventing errors in digital communication over wireless and other transmission media. The algorithm, which became labeled with my name, was a crucial step in establishing the merits as well as evaluating the performance of these codes. The paper was read and understood by only a few specialists. In the next few years, clarity was provided by two papers, the first by a colleague, G.D. Forney Jr., who introduced the trellis model, and the second by myself based on a state diagram, or Markov model. A.A. Markov was a Russian mathematician who proposed and analyzed a statistical concept regarding the relationship between terms of a sequence or, more generally, of successive events; specifically, that each term (or string of terms) or event is statistically dependent only on the previous one.


Pufferfish Privacy Mechanisms for Correlated Data

arXiv.org Machine Learning

Many modern databases include personal and sensitive correlated data, such as private information on users connected together in a social network, and measurements of physical activity of single subjects across time. However, differential privacy, the current gold standard in data privacy, does not adequately address privacy issues in this kind of data. This work looks at a recent generalization of differential privacy, called Pufferfish, that can be used to address privacy in correlated data. The main challenge in applying Pufferfish is a lack of suitable mechanisms. We provide the first mechanism -- the Wasserstein Mechanism -- which applies to any general Pufferfish framework. Since this mechanism may be computationally inefficient, we provide an additional mechanism that applies to some practical cases such as physical activity measurements across time, and is computationally efficient. Our experimental evaluations indicate that this mechanism provides privacy and utility for synthetic as well as real data in two separate domains.


Markov Chain Truncation for Doubly-Intractable Inference

arXiv.org Machine Learning

Computing partition functions, the normalizing constants of probability distributions, is often hard. Variants of importance sampling give unbiased estimates of a normalizer Z, however, unbiased estimates of the reciprocal 1/Z are harder to obtain. Unbiased estimates of 1/Z allow Markov chain Monte Carlo sampling of "doubly-intractable" distributions, such as the parameter posterior for Markov Random Fields or Exponential Random Graphs. We demonstrate how to construct unbiased estimates for 1/Z given access to black-box importance sampling estimators for Z. We adapt recent work on random series truncation and Markov chain coupling, producing estimators with lower variance and a higher percentage of positive estimates than before. Our debiasing algorithms are simple to implement, and have some theoretical and empirical advantages over existing methods.


Markov Chain Lifting and Distributed ADMM

arXiv.org Machine Learning

The time to converge to the steady state of a finite Markov chain can be greatly reduced by a lifting operation, which creates a new Markov chain on an expanded state space. For a class of quadratic objectives, we show an analogous behavior where a distributed ADMM algorithm can be seen as a lifting of Gradient Descent algorithm. This provides a deep insight for its faster convergence rate under optimal parameter tuning. We conjecture that this gain is always present, as opposed to the lifting of a Markov chain which sometimes only provides a marginal speedup.


Parallel Markov Chain Monte Carlo for the Indian Buffet Process

arXiv.org Machine Learning

Indian Buffet Process based models are an elegant way for discovering underlying features within a data set, but inference in such models can be slow. Inferring underlying features using Markov chain Monte Carlo either relies on an uncollapsed representation, which leads to poor mixing, or on a collapsed representation, which leads to a quadratic increase in computational complexity. Existing attempts at distributing inference have introduced additional approximation within the inference procedure. In this paper we present a novel algorithm to perform asymptotically exact parallel Markov chain Monte Carlo inference for Indian Buffet Process models. We take advantage of the fact that the features are conditionally independent under the beta-Bernoulli process. Because of this conditional independence, we can partition the features into two parts: one part containing only the finitely many instantiated features and the other part containing the infinite tail of uninstantiated features. For the finite partition, parallel inference is simple given the instantiation of features. But for the infinite tail, performing uncollapsed MCMC leads to poor mixing and hence we collapse out the features. The resulting hybrid sampler, while being parallel, produces samples asymptotically from the true posterior.


Sample Efficient Feature Selection for Factored MDPs

arXiv.org Machine Learning

In reinforcement learning, the state of the real world is often represented by feature vectors. However, not all of the features may be pertinent for solving the current task. We propose Feature Selection Explore and Exploit (FS-EE), an algorithm that automatically selects the necessary features while learning a Factored Markov Decision Process, and prove that under mild assumptions, its sample complexity scales with the in-degree of the dynamics of just the necessary features, rather than the in-degree of all features. This can result in a much better sample complexity when the in-degree of the necessary features is smaller than the in-degree of all features.


Deep Probabilistic Programming

arXiv.org Machine Learning

We propose Edward, a Turing-complete probabilistic programming language. Edward defines two compositional representations---random variables and inference. By treating inference as a first class citizen, on a par with modeling, we show that probabilistic programming can be as flexible and computationally efficient as traditional deep learning. For flexibility, Edward makes it easy to fit the same model using a variety of composable inference methods, ranging from point estimation to variational inference to MCMC. In addition, Edward can reuse the modeling representation as part of inference, facilitating the design of rich variational models and generative adversarial networks. For efficiency, Edward is integrated into TensorFlow, providing significant speedups over existing probabilistic systems. For example, we show on a benchmark logistic regression task that Edward is at least 35x faster than Stan and 6x faster than PyMC3. Further, Edward incurs no runtime overhead: it is as fast as handwritten TensorFlow.


Measuring Sample Quality with Stein's Method

arXiv.org Machine Learning

To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. The reasoning is sound: a reduction in variance due to more rapid sampling can outweigh the bias introduced. However, the inexactness creates new challenges for sampler and parameter selection, since standard measures of sample quality like effective sample size do not account for asymptotic bias. To address these challenges, we introduce a new computable quality measure based on Stein's method that quantifies the maximum discrepancy between sample and target expectations over a large class of test functions. We use our tool to compare exact, biased, and deterministic sample sequences and illustrate applications to hyperparameter selection, convergence rate assessment, and quantifying bias-variance tradeoffs in posterior inference.


MultiView Diffusion Maps

arXiv.org Machine Learning

In this study we consider learning a reduced dimensionality representation from datasets obtained under multiple views. Such multiple views of datasets can be obtained, for example, when the same underlying process is observed using several different modalities, or measured with different instrumentation. Our goal is to effectively exploit the availability of such multiple views for various purposes, such as non-linear embedding, manifold learning, spectral clustering, anomaly detection and non-linear system identification. Our proposed method exploits the intrinsic relation within each view, as well as the mutual relations between views. We do this by defining a cross-view model, in which an implied Random Walk process between objects is restrained to hop between the different views. Our method is robust to scaling of each dataset, and is insensitive to small structural changes in the data. Within this framework, we define new diffusion distances and analyze the spectra of the implied kernels. We demonstrate the applicability of the proposed approach on both artificial and real data sets.