Markov Models
Towards Representation Learning with Tractable Probabilistic Models
Vergari, Antonio, Di Mauro, Nicola, Esposito, Floriana
Probabilistic models learned as density estimators can be exploited in representation learning beside being toolboxes used to answer inference queries only. However, how to extract useful representations highly depends on the particular model involved. We argue that tractable inference, i.e. inference that can be computed in polynomial time, can enable general schemes to extract features from black box models. We plan to investigate how Tractable Probabilistic Models (TPMs) can be exploited to generate embeddings by random query evaluations. We devise two experimental designs to assess and compare different TPMs as feature extractors in an unsupervised representation learning framework. We show some experimental results on standard image datasets by applying such a method to Sum-Product Networks and Mixture of Trees as tractable models generating embeddings.
A Distance for HMMs based on Aggregated Wasserstein Metric and State Registration
Chen, Yukun, Ye, Jianbo, Li, Jia
We propose a framework, named Aggregated Wasserstein, for computing a dissimilarity measure or distance between two Hidden Markov Models with state conditional distributions being Gaussian. For such HMMs, the marginal distribution at any time spot follows a Gaussian mixture distribution, a fact exploited to softly match, aka register, the states in two HMMs. We refer to such HMMs as Gaussian mixture model-HMM (GMM-HMM). The registration of states is inspired by the intrinsic relationship of optimal transport and the Wasserstein metric between distributions. Specifically, the components of the marginal GMMs are matched by solving an optimal transport problem where the cost between components is the Wasserstein metric for Gaussian distributions. The solution of the optimization problem is a fast approximation to the Wasserstein metric between two GMMs. The new Aggregated Wasserstein distance is a semi-metric and can be computed without generating Monte Carlo samples. It is invariant to relabeling or permutation of the states. This distance quantifies the dissimilarity of GMM-HMMs by measuring both the difference between the two marginal GMMs and the difference between the two transition matrices. Our new distance is tested on the tasks of retrieval and classification of time series. Experiments on both synthetic data and real data have demonstrated its advantages in terms of accuracy as well as efficiency in comparison with existing distances based on the Kullback-Leibler divergence.
A Stochastic Temporal Model of Polyphonic MIDI Performance with Ornaments
Nakamura, Eita, Ono, Nobutaka, Sagayama, Shigeki, Watanabe, Kenji
We study indeterminacies in realization of ornaments and how they can be incorporated in a stochastic performance model applicable for music information processing such as score-performance matching. We point out the importance of temporal information, and propose a hidden Markov model which describes it explicitly and represents ornaments with several state types. Following a review of the indeterminacies, they are carefully incorporated into the model through its topology and parameters, and the state construction for quite general polyphonic scores is explained in detail. By analyzing piano performance data, we find significant overlaps in inter-onset-interval distributions of chordal notes, ornaments, and inter-chord events, and the data is used to determine details of the model. The model is applied for score following and offline score-performance matching, yielding highly accurate matching for performances with many ornaments and relatively frequent errors, repeats, and skips.
Multi Level Monte Carlo methods for a class of ergodic stochastic differential equations
Szpruch, Lukasz, Vollmer, Sebastian, Zygalakis, Konstantinos, Giles, Michael B.
We develop a framework that allows the use of the multi-level Monte Carlo (MLMC) methodology (Giles 2015) to calculate expectations with respect to the invariant measures of ergodic SDEs. In that context, we study the (over-damped) Langevin equations with strongly convex potential. We show that, when appropriate contracting couplings for the numerical integrators are available, one can obtain a time-uniform estimates of the MLMC variance in stark contrast to the majority of the results in the MLMC literature. As a consequence, one can approximate expectations with respect to the invariant measure in an unbiased way without the need of a Metropolis- Hastings step. In addition, a root mean square error of $\mathcal{O}(\epsilon)$ is achieved with $\mathcal{O}(\epsilon^{-2})$ complexity on par with Markov Chain Monte Carlo (MCMC) methods, which however can be computationally intensive when applied to large data sets. Finally, we present a multilevel version of the recently introduced Stochastic Gradient Langevin (SGLD) method (Welling and Teh, 2011) built for large datasets applications. We show that this is the first stochastic gradient MCMC method with complexity $\mathcal{O}(\epsilon^{-2}|\log {\epsilon}|^{3})$, which is asymptotically an order $\epsilon$ lower than the $ \mathcal{O}(\epsilon^{-3})$ complexity of all stochastic gradient MCMC methods that are currently available. Numerical experiments confirm our theoretical findings.
Multiple Testing for Neuroimaging via Hidden Markov Random Field
Shu, Hai, Nan, Bin, Koeppe, Robert
Traditional voxel-level multiple testing procedures in neuroimaging, mostly $p$-value based, often ignore the spatial correlations among neighboring voxels and thus suffer from substantial loss of power. We extend the local-significance-index based procedure originally developed for the hidden Markov chain models, which aims to minimize the false nondiscovery rate subject to a constraint on the false discovery rate, to three-dimensional neuroimaging data using a hidden Markov random field model. A generalized expectation-maximization algorithm for maximizing the penalized likelihood is proposed for estimating the model parameters. Extensive simulations show that the proposed approach is more powerful than conventional false discovery rate procedures. We apply the method to the comparison between mild cognitive impairment, a disease status with increased risk of developing Alzheimer's or another dementia, and normal controls in the FDG-PET imaging study of the Alzheimer's Disease Neuroimaging Initiative.
RAND-WALK: A Latent Variable Model Approach to Word Embeddings
Arora, Sanjeev, Li, Yuanzhi, Liang, Yingyu, Ma, Tengyu, Risteski, Andrej
Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods. Many use nonlinear operations on co-occurrence statistics, and have hand-tuned hyperparameters and reweighting methods. This paper proposes a new generative model, a dynamic version of the log-linear topic model of~\citet{mnih2007three}. The methodological novelty is to use the prior to compute closed form expressions for word statistics. This provides a theoretical justification for nonlinear models like PMI, word2vec, and GloVe, as well as some hyperparameter choices. It also helps explain why low-dimensional semantic embeddings contain linear algebraic structure that allows solution of word analogies, as shown by~\citet{mikolov2013efficient} and many subsequent papers. Experimental support is provided for the generative model assumptions, the most important of which is that latent word vectors are fairly uniformly dispersed in space.
Artificial intelligence - Wikipedia, the free encyclopedia
Artificial intelligence (AI) is intelligence exhibited by machines. In computer science, an ideal "intelligent" machine is a flexible rational agent that perceives its environment and takes actions that maximize its chance of success at some goal.[1] Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving".[2] As machines become increasingly capable, facilities once thought to require intelligence are removed from the definition. For example, optical character recognition is no longer perceived as an exemplar of "artificial intelligence" having become a routine technology.[3] Capabilities still classified as AI include advanced Chess and Go systems and self-driving cars. AI research is divided into subfields[4] that focus on specific problems or on specific approaches or on the use of a particular tool or towards satisfying particular applications. The central problems (or goals) of AI research include reasoning, knowledge, planning, learning, natural language processing (communication), perception and the ability to move and manipulate objects.[5] General intelligence is among the field's long-term goals.[6] Approaches include statistical methods, computational intelligence, soft computing (e.g. machine learning), and traditional symbolic AI. Many tools are used in AI, including versions of search and mathematical optimization, logic, methods based on probability and economics. The AI field draws upon computer science, mathematics, psychology, linguistics, philosophy, neuroscience and artificial psychology. The field was founded on the claim that human intelligence "can be so precisely described that a machine can be made to simulate it."[7] This raises philosophical arguments about the nature of the mind and the ethics of creating artificial beings endowed with human-like intelligence, issues which have been explored by myth, fiction and philosophy since antiquity.[8] Attempts to create artificial intelligence has experienced many setbacks, including the ALPAC report of 1966, the abandonment of perceptrons in 1970, the Lighthill Report of 1973 and the collapse of the Lisp machine market in 1987. In the twenty-first century AI techniques became an essential part of the technology industry, helping to solve many challenging problems in computer science.[9]
An Introduction to Language Modeling With N-Grams and Markov Chains
See the full presentation, slides, and notes. N 1: "Unigram (or, you know, a word)" ie: "The" "A Markov chain is a probabilistic model well suited to semi-coherent text synthesis." While straining to live in Ukraine with anxiety and broad range of my surroundings, along the ones I felt physically threatened and the rush I burst into a ten-year old who they sought a poem that matters, I was I should be invincible. Who would have paid for granted, but maybe it was asked to further education is an annual overnight to San Diego, water fun, cheers, a year, I still burn in the invisible enemy in the night when I cannot feel the traffic outside the times I want to a missionary would be neither relived nor reanimated. I assume the status quo, seems fair; I were a stylish figure, for me, and knees.
A Beginner's Tutorial for Restricted Boltzmann Machines - Deeplearning4j: Open-source, distributed deep learning for the JVM
Invented by Geoff Hinton, a Restricted Boltzmann machine is an algorithm useful for dimensionality reduction, classification, regression, collaborative filtering, feature learning and topic modeling. Given their relative simplicity and historical importance, restricted Boltzmann machines are the first neural network we'll tackle. In the paragraphs below, we describe in diagrams and plain language how they work. RBMs are shallow, two-layer neural nets that constitute the building blocks of deep-belief networks. The first layer of the RBM is called the visible, or input, layer, and the second is the hidden layer. Each circle in the graph above represents a neuron-like unit called a node, and nodes are simply where calculations take place.