Goto

Collaborating Authors

 Undirected Networks


Understanding and Designing Complex Systems: Response to "A framework for optimal high-level descriptions in science and engineering---preliminary report"

arXiv.org Artificial Intelligence

Building compact models of nonlinear processes goes to the heart of our understanding the complex world around us--a world replete with unanticipated, emergent patterns. Via discovery mechanisms that we do not yet understand well, we eventually do come to know many of these patterns, even if we have never seen them before. Such discoveries can be substantial. At a minimum, compact models that capture such emergent "macrostates" are essential tools in harnessing complex processes to useful ends. Most ambitiously, one would hope to automate the discovery process itself, providing an especially useful tool for the era of Big Data. One key problem in the larger endeavor of pattern discovery is dimension reduction: reduce the high-dimensional state space of a stochastic dynamical system into smaller, more manageable models that nonetheless still capture the relevant dynamics. The study of complex systems always requires this.


Tutorial on Structured Continuous-Time Markov Processes

Journal of Artificial Intelligence Research

A continuous-time Markov process (CTMP) is a collection of variables indexed by a continuous quantity, time. It obeys the Markov property that the distribution over a future variable is independent of past variables given the state at the present time. We introduce continuous-time Markov process representations and algorithms for filtering, smoothing, expected sufficient statistics calculations, and model estimation, assuming no prior knowledge of continuous-time processes but some basic knowledge of probability and statistics. We begin by describing "flat" or unstructured Markov processes and then move to structured Markov processes (those arising from state spaces consisting of assignments to variables) including Kronecker, decision-diagram, and continuous-time Bayesian network representations. We provide the first connection between decision-diagrams and continuous-time Bayesian networks.


Circumventing the Curse of Dimensionality in Prediction: Causal Rate-Distortion for Infinite-Order Markov Processes

arXiv.org Machine Learning

Predictive rate-distortion analysis suffers from the curse of dimensionality: clustering arbitrarily long pasts to retain information about arbitrarily long futures requires resources that typically grow exponentially with length. The challenge is compounded for infinite-order Markov processes, since conditioning on finite sequences cannot capture all of their past dependencies. Spectral arguments show that algorithms which cluster finite-length sequences fail dramatically when the underlying process has long-range temporal correlations and can fail even for processes generated by finite-memory hidden Markov models. We circumvent the curse of dimensionality in rate-distortion analysis of infinite-order processes by casting predictive rate-distortion objective functions in terms of the forward- and reverse-time causal states of computational mechanics. Examples demonstrate that the resulting causal rate-distortion theory substantially improves current predictive rate-distortion analyses.


Iterative Bayesian Reconstruction of Non-IID Block-Sparse Signals

arXiv.org Machine Learning

This paper presents a novel Block Iterative Bayesian Algorithm (Block-IBA) for reconstructing block-sparse signals with unknown block structures. Unlike the existing algorithms for block sparse signal recovery which assume the cluster structure of the nonzero elements of the unknown signal to be independent and identically distributed (i.i.d.), we use a more realistic Bernoulli-Gaussian hidden Markov model (BGHMM) to characterize the non-i.i.d. block-sparse signals commonly encountered in practice. The Block-IBA iteratively estimates the amplitudes and positions of the block-sparse signal using the steepest-ascent based Expectation-Maximization (EM), and optimally selects the nonzero elements of the block-sparse signal by adaptive thresholding. The global convergence of Block-IBA is analyzed and proved, and the effectiveness of Block-IBA is demonstrated by numerical experiments and simulations on synthetic and real-life data.


Iterative Neural Autoregressive Distribution Estimator (NADE-k)

arXiv.org Machine Learning

Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data. We propose a new model that extends this inference scheme to multiple steps, arguing that it is easier to learn to improve a reconstruction in $k$ steps rather than to learn to reconstruct in a single inference step. The proposed model is an unsupervised building block for deep learning that combines the desirable properties of NADE and multi-predictive training: (1) Its test likelihood can be computed analytically, (2) it is easy to generate independent samples from it, and (3) it uses an inference engine that is a superset of variational inference for Boltzmann machines. The proposed NADE-k is competitive with the state-of-the-art in density estimation on the two datasets tested.


End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

arXiv.org Machine Learning

Dzmitry Bahdanau Jacobs University Bremen, Germany Yoshua Bengio Universitรฉ de Montrรฉal CIFAR Senior Fellow We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bidirectional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context created with a subset of input symbols selected by the attention mechanism. We report initial results demonstrating that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.


Detection of cheating by decimation algorithm

arXiv.org Machine Learning

We expand the item response theory to study the case of "cheating students" for a set of exams, trying to detect them by applying a greedy algorithm of inference. This extended model is closely related to the Boltzmann machine learning. In this paper we aim to infer the correct biases and interactions of our model by considering a relatively small number of sets of training data. Nevertheless, the greedy algorithm that we employed in the present study exhibits good performance with a few number of training data. The key point is the sparseness of the interactions in our problem in the context of the Boltzmann machine learning: the existence of cheating students is expected to be very rare (possibly even in real world). We compare a standard approach to infer the sparse interactions in the Boltzmann machine learning to our greedy algorithm and we find the latter to be superior in several aspects.


Fuzzy human motion analysis: A review

arXiv.org Artificial Intelligence

Human Motion Analysis (HMA) is currently one of the most popularly active research domains as such significant research interests are motivated by a number of real world applications such as video surveillance, sports analysis, healthcare monitoring and so on. However, most of these real world applications face high levels of uncertainties that can affect the operations of such applications. Hence, the fuzzy set theory has been applied and showed great success in the recent past. In this paper, we aim at reviewing the fuzzy set oriented approaches for HMA, individuating how the fuzzy set may improve the HMA, envisaging and delineating the future perspectives. To the best of our knowledge, there is not found a single survey in the current literature that has discussed and reviewed fuzzy approaches towards the HMA. For ease of understanding, we conceptually classify the human motion into three broad levels: Low-Level (LoL), Mid-Level (MiL), and High-Level (HiL) HMA.


Lifted Probabilistic Inference for Asymmetric Graphical Models

arXiv.org Artificial Intelligence

Lifted probabilistic inference algorithms have been successfully applied to a large number of symmetric graphical models. Unfortunately, the majority of real-world graphical models is asymmetric. This is even the case for relational representations when evidence is given. Therefore, more recent work in the community moved to making the models symmetric and then applying existing lifted inference algorithms. However, this approach has two shortcomings. First, all existing over-symmetric approximations require a relational representation such as Markov logic networks. Second, the induced symmetries often change the distribution significantly, making the computed probabilities highly biased. We present a framework for probabilistic sampling-based inference that only uses the induced approximate symmetries to propose steps in a Metropolis-Hastings style Markov chain. The framework, therefore, leads to improved probability estimates while remaining unbiased. Experiments demonstrate that the approach outperforms existing MCMC algorithms.


Learning graphical models from the Glauber dynamics

arXiv.org Machine Learning

In this paper we consider the problem of learning undirected graphical models from data generated according to the Glauber dynamics. The Glauber dynamics is a Markov chain that sequentially updates individual nodes (variables) in a graphical model and it is frequently used to sample from the stationary distribution (to which it converges given sufficient time). Additionally, the Glauber dynamics is a natural dynamical model in a variety of settings. This work deviates from the standard formulation of graphical model learning in the literature, where one assumes access to i.i.d. samples from the distribution. Much of the research on graphical model learning has been directed towards finding algorithms with low computational cost. As the main result of this work, we establish that the problem of reconstructing binary pairwise graphical models is computationally tractable when we observe the Glauber dynamics. Specifically, we show that a binary pairwise graphical model on $p$ nodes with maximum degree $d$ can be learned in time $f(d)p^2\log p$, for a function $f(d)$, using nearly the information-theoretic minimum number of samples.