Markov Models
Hidden Markov Models for Regime Detection using R - QuantStart
In the previous article in the series Hidden Markov Models were introduced. They were discussed in the context of the broader class of Markov Models. They were motivated by the need for quantitative traders to have the ability to detect market regimes in order to adjust how their quant strategies are managed. In particular it was mentioned that "various regimes lead to adjustments of asset returns via shifts in their means, variances/volatilities, serial correlation and covariances, which impact the effectiveness of time series methods that rely on stationarity". This has a significant bearing on how trading strategies are modified throughout the strategy lifecycle.
Man VS Machine: The Secrets Behind Alibaba Cloud's Speech Recognition Technology - AliCloud Developer Forums: Cloud Discussion Forums
Introduction In the previous article, we described combat performance in the Artificial Intelligence PK Gold Medal Stenography Competition and told the story behind the annual Alibaba Cloud meeting's Man VS Machine competition. Are there any curious technology geeks out there? What was the on-site real-time transcription system? What on earth is the core of a speech recognition system? How come the Alibaba Cloud iDST speech recognition system is so accurate?
Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models
Krakovna, Viktoriya, Doshi-Velez, Finale
As deep neural networks continue to revolutionize various application domains, there is increasing interest in making these powerful models more understandable and interpretable, and narrowing down the causes of good and bad predictions. We focus on recurrent neural networks (RNNs), state of the art models in speech recognition and translation. Our approach to increasing interpretability is by combining an RNN with a hidden Markov model (HMM), a simpler and more transparent model. We explore various combinations of RNNs and HMMs: an HMM trained on LSTM states; a hybrid model where an HMM is trained first, then a small LSTM is given HMM state distributions and trained to fill in gaps in the HMM's performance; and a jointly trained hybrid model. We find that the LSTM and HMM learn complementary information about the features in the text.
Multi-label Methods for Prediction with Sequential Data
Read, Jesse, Martino, Luca, Hollmรฉn, Jaakko
The number of methods available for classification of multi-label data has increased rapidly over recent years, yet relatively few links have been made with the related task of classification of sequential data. If labels indices are considered as time indices, the problems can often be seen as equivalent. In this paper we detect and elaborate on connections between multi-label methods and Markovian models, and study the suitability of multi-label methods for prediction in sequential data. From this study we draw upon the most suitable techniques from the area and develop two novel competitive approaches which can be applied to either kind of data. We carry out an empirical evaluation investigating performance on real-world sequential-prediction tasks: electricity demand, and route prediction. As well as showing that several popular multi-label algorithms are in fact easily applicable to sequencing tasks, our novel approaches, which benefit from a unified view of these areas, prove very competitive against established methods. Keywords: multi-label classification; problem transformation; sequential data; sequence prediction; Markov models 1. Introduction Multi-label classification is the supervised learning problem where an instance is associated with multiple class variables (i.e., labels), rather than with a single class, as in traditional classification problems. See [1] for a review. Corresponding author, jesse.read@polytechnique.edu Preprint submitted to Pattern Recognition September 29, 2016 labels were modelled independently - at the expense of an increased computational cost. The case of binary labels is most common, where a positive class value denotes the relevance of the label (and the negative or null class denotes irrelevance). Typical examples of binary multi-label classification involve categorizing text documents and images, which can be assigned any subset of a particular label set. For example, an image can be associated with both labels beach and sunset. The multi-label classification paradigm has been successfully considered also in many other domains, such as text, video, audio, and bioinformatics - see [1] and references therein for further examples.
Whole-brain substitute CT generation using Markov random field mixture models
Hildeman, Anders, Bolin, David, Wallin, Jonas, Johansson, Adam, Nyholm, Tufve, Asklund, Thomas, Yu, Jun
Computed tomography (CT) equivalent information is needed for attenuation correction in PET imaging and for dose planning in radiotherapy. Prior work has shown that Gaussian mixture models can be used to generate a substitute CT (s-CT) image from a specific set of MRI modalities. This work introduces a more flexible class of mixture models for s-CT generation, that incorporates spatial dependency in the data through a Markov random field prior on the latent field of class memberships associated with a mixture model. Furthermore, the mixture distributions are extended from Gaussian to normal inverse Gaussian (NIG), allowing heavier tails and skewness. The amount of data needed to train a model for s-CT generation is of the order of 100 million voxels. The computational efficiency of the parameter estimation and prediction methods are hence paramount, especially when spatial dependency is included in the models. A stochastic Expectation Maximization (EM) gradient algorithm is proposed in order to tackle this challenge. The advantages of the spatial model and NIG distributions are evaluated with a cross-validation study based on data from 14 patients. The study show that the proposed model enhances the predictive quality of the s-CT images by reducing the mean absolute error with 17.9%. Also, the distribution of CT values conditioned on the MR images are better explained by the proposed model as evaluated using continuous ranked probability scores.
Predictive Coarse-Graining
Schรถberl, Markus, Zabaras, Nicholas, Koutsourelakis, Phaedon-Stelios
We propose a data-driven, coarse-graining formulation in the context of equilibrium statistical mechanics. In contrast to existing techniques which are based on a fine-to-coarse map, we adopt the opposite strategy by prescribing a probabilistic coarse-to-fine map. This corresponds to a directed probabilistic model where the coarse variables play the role of latent generators of the fine scale (all-atom) data. From an information-theoretic perspective, the framework proposed provides an improvement upon the relative entropy method and is capable of quantifying the uncertainty due to the information loss that unavoidably takes place during the CG process. Furthermore, it can be readily extended to a fully Bayesian model where various sources of uncertainties are reflected in the posterior of the model parameters. The latter can be used to produce not only point estimates of fine-scale reconstructions or macroscopic observables, but more importantly, predictive posterior distributions on these quantities. Predictive posterior distributions reflect the confidence of the model as a function of the amount of data and the level of coarse-graining. The issues of model complexity and model selection are seamlessly addressed by employing a hierarchical prior that favors the discovery of sparse solutions, revealing the most prominent features in the coarse-grained model. A flexible and parallelizable Monte Carlo - Expectation-Maximization (MC-EM) scheme is proposed for carrying out inference and learning tasks. A comparative assessment of the proposed methodology is presented for a lattice spin system and the SPC/E water model.
Markov Chain Monte Carlo - Nice R Code
This topic doesn't have much to do with nicer code, but there is probably some overlap in interest. However, some of the topics that we cover arise naturally here, so read on! MCMC is simply an algorithm for sampling from a distribution. The term stands for "Markov Chain Monte Carlo", because it is a type of "Monte Carlo" (i.e., a random) method that uses "Markov chains" (we'll discuss these later). MCMC is just one type of Monte Carlo method, although it is possible to view many other commonly used methods as simply special cases of MCMC.
Uniform {\varepsilon}-Stability of Distributed Nonlinear Filtering over DNAs: Gaussian-Finite HMMs
Kalogerias, Dionysios S., Petropulu, Athina P.
In this work, we study stability of distributed filtering of Markov chains with finite state space, partially observed in conditionally Gaussian noise. We consider a nonlinear filtering scheme over a Distributed Network of Agents (DNA), which relies on the distributed evaluation of the likelihood part of the centralized nonlinear filter and is based on a particular specialization of the Alternating Direction Method of Multipliers (ADMM) for fast average consensus. Assuming the same number of consensus steps between any two consecutive noisy measurements for each sensor in the network, we fully characterize a minimal number of such steps, such that the distributed filter remains uniformly stable with a prescribed accuracy level, {\varepsilon} \in (0,1], within a finite operational horizon, T, and across all sensors. Stability is in the sense of the \ell_1-norm between the centralized and distributed versions of the posterior at each sensor, and at each time within T. Roughly speaking, our main result shows that uniform {\varepsilon}-stability of the distributed filtering process depends only loglinearly on T and (roughly) the size of the network, and only logarithmically on 1/{\varepsilon}. If this total loglinear bound is fulfilled, any additional consensus iterations will incur a fully quantified further exponential decay in the consensus error. Our bounds are universal, in the sense that they are independent of the particular structure of the Gaussian Hidden Markov Model (HMM) under consideration.
Fast Learning of Clusters and Topics via Sparse Posteriors
Hughes, Michael C., Sudderth, Erik B.
Mixture models and topic models generate each observation from a single cluster, but standard variational posteriors for each observation assign positive probability to all possible clusters. This requires dense storage and runtime costs that scale with the total number of clusters, even though typically only a few clusters have significant posterior mass for any data point. We propose a constrained family of sparse variational distributions that allow at most $L$ non-zero entries, where the tunable threshold $L$ trades off speed for accuracy. Previous sparse approximations have used hard assignments ($L=1$), but we find that moderate values of $L>1$ provide superior performance. Our approach easily integrates with stochastic or incremental optimization algorithms to scale to millions of examples. Experiments training mixture models of image patches and topic models for news articles show that our approach produces better-quality models in far less time than baseline methods.