Goto

Collaborating Authors

 Markov Models


Some Applications of Markov Chain in Python

@machinelearnbot

In this article a few simple applications of Markov chain are going to be discussed as a solution to a few text processing problems. These problems appeared as assignments in a few courses, the descriptions are taken straightaway from the courses themselves. Use a Markov chain to create a statistical model of a piece of English text. Simulate the Markov chain to generate stylized pseudo-random text. In the 1948 landmark paper A Mathematical Theory of Communication, Claude Shannon founded the field of information theory and revolutionized the telecommunications industry, laying the groundwork for today's Information Age. In this paper, Shannon proposed using a Markov chain to create a statistical model of the sequences of letters in a piece of English text. Markov chains are now widely used in speech recognition, handwriting recognition, information retrieval, data compression, and spam filtering. They also have many scientific computing applications including the genemark algorithm for gene prediction, the Metropolis algorithm for measuring thermodynamical properties, and Google's PageRank algorithm for Web search.


Automatic Classification of Music Genre using Masked Conditional Neural Networks

arXiv.org Machine Learning

Neural network based architectures used for sound recognition are usually adapted from other application domains such as image recognition, which may not harness the time-frequency representation of a signal. The ConditionaL Neural Networks (CLNN) and its extension the Masked ConditionaL Neural Networks (MCLNN) are designed for multidimensional temporal signal recognition. The CLNN is trained over a window of frames to preserve the inter-frame relation, and the MCLNN enforces a systematic sparseness over the network's links that mimics a filterbank-like behavior. The masking operation induces the network to learn in frequency bands, which decreases the network susceptibility to frequency-shifts in time-frequency representations. Additionally, the mask allows an exploration of a range of feature combinations concurrently analogous to the manual handcrafting of the optimum collection of features for a recognition task. MCLNN have achieved competitive performance on the Ballroom music dataset compared to several hand-crafted attempts and outperformed models based on state-of-the-art Convolutional Neural Networks.


Conditional Random Fields (CRF): Short Survey

@machinelearnbot

Currently, many of us are overwhelmed with mighty power of Deep Learning. We start to forget about humble graphical models. CRF is not so trendy as LSTM, but it is robust, reliable and worth noting. In this post, you will find a short summary about CRF (aka Conditional Random Fields) โ€“ what is this thing, what is it for and some interesting facts. In practical implementation, the computational time is often larger due to many other operations like numerical scaling, smoothing etc.


Generative Adversarial Networks (GANs): Engine and Applications

@machinelearnbot

GANs were introduced by Ian Goodfellow in 2014. Both of them are dedicated to extract features from data by learning the identity function f(x) x and both of them rely on Markov chains to train or to generate samples. Generative adversarial networks were designed to avoid using Markov chains because of the high computational cost of the latter. Another advantage relative to Boltzmann machines is that the Generator function has much fewer restrictions (there are only a few probability distributions that admit Markov chain sampling). In this article, we'll tell you how generative adversarial nets work and what their most popular applications in real life are.


Noisy Expectation-Maximization: Applications and Generalizations

arXiv.org Machine Learning

We present a noise-injected version of the Expectation-Maximization (EM) algorithm: the Noisy Expectation Maximization (NEM) algorithm. The NEM algorithm uses noise to speed up the convergence of the EM algorithm. The NEM theorem shows that injected noise speeds up the average convergence of the EM algorithm to a local maximum of the likelihood surface if a positivity condition holds. The generalized form of the noisy expectation-maximization (NEM) algorithm allow for arbitrary modes of noise injection including adding and multiplying noise to the data. We demonstrate these noise benefits on EM algorithms for the Gaussian mixture model (GMM) with both additive and multiplicative NEM noise injection. A separate theorem (not presented here) shows that the noise benefit for independent identically distributed additive noise decreases with sample size in mixture models. This theorem implies that the noise benefit is most pronounced if the data is sparse. Injecting blind noise only slowed convergence.


The History Of Speech Recognition And A Glimpse Into Its Future

#artificialintelligence

With the release of Apple's Siri and comparable voice search assistance from Microsoft and Google, you might have speculated why it took so long for speech recognition innovation to progress to this stage. In addition, one may also wonder what the future holds for natural language-based machine intelligence learning and its impact on our everyday lives. A closer look at the history and development of voice recognition technology may be somewhat akin to watching a toddler grow up, advancing from the baby-talk level and developing terminologies of countless words to responding to queries with fast, amusing repartees, just like what the clever digital assistant Siri does. Here is a close depiction at the innovations of the past generations with regards to speech recognition and what the future has in store for this technology. The "Audrey" system is the earliest speech recognition device that could recognize only digits.


Distributed Constraint Optimization Problems and Applications: A Survey

arXiv.org Artificial Intelligence

The field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents' autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.


Stochastic Gradient Monomial Gamma Sampler

arXiv.org Machine Learning

Recent advances in stochastic gradient techniques have made it possible to estimate posterior distributions from large datasets via Markov Chain Monte Carlo (MCMC). However, when the target posterior is multimodal, mixing performance is often poor. This results in inadequate exploration of the posterior distribution. A framework is proposed to improve the sampling efficiency of stochastic gradient MCMC, based on Hamiltonian Monte Carlo. A generalized kinetic function is leveraged, delivering superior stationary mixing, especially for multimodal distributions. Techniques are also discussed to overcome the practical issues introduced by this generalization. It is shown that the proposed approach is better at exploring complex multimodal posterior distributions, as demonstrated on multiple applications and in comparison with other stochastic gradient MCMC methods.


Weighted Contrastive Divergence

arXiv.org Machine Learning

Learning algorithms for energy based Boltzmann architectures that rely on gradient descent are in general computationally prohibitive, typically due to the exponential number of terms involved in computing the partition function. In this way one has to resort to approximation schemes for the evaluation of the gradient. This is the case of Restricted Boltzmann Machines (RBM) and its learning algorithm Contrastive Divergence (CD). It is well-known that CD has a number of shortcomings, and its approximation to the gradient has several drawbacks. Overcoming these defects has been the basis of much research and new algorithms have been devised, such as persistent CD. In this manuscript we propose a new algorithm that we call Weighted CD (WCD), built from small modifications of the negative phase in standard CD. However small these modifications may be, experimental work reported in this paper suggest that WCD provides a significant improvement over standard CD and persistent CD at a small additional computational cost.


Log-concave sampling: Metropolis-Hastings algorithms are fast!

arXiv.org Machine Learning

We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove a non-asymptotic upper bound on the mixing time of the Metropolis-adjusted Langevin algorithm (MALA). The method draws samples by running a Markov chain obtained from the discretization of an appropriate Langevin diffusion, combined with an accept-reject step to ensure the correct stationary distribution. Relative to known guarantees for the unadjusted Langevin algorithm (ULA), our bounds show that the use of an accept-reject step in MALA leads to an exponentially improved dependence on the error-tolerance. Concretely, in order to obtain samples with TV error at most $\delta$ for a density with condition number $\kappa$, we show that MALA requires $\mathcal{O} \big(\kappa d \log(1/\delta) \big)$ steps, as compared to the $\mathcal{O} \big(\kappa^2 d/\delta^2 \big)$ steps established in past work on ULA. We also demonstrate the gains of MALA over ULA for weakly log-concave densities. Furthermore, we derive mixing time bounds for a zeroth-order method Metropolized random walk (MRW) and show that it mixes $\mathcal{O}(\kappa d)$ slower than MALA. We provide numerical examples that support our theoretical findings, and demonstrate the potential gains of Metropolis-Hastings adjustment for Langevin-type algorithms.