Markov Models
Discriminative training for Convolved Multiple-Output Gaussian processes
Gómez-González, Sebastián, Álvarez, Mauricio A., García, Hernán Felipe
Multi-output Gaussian processes (MOGP) are probability distributions over vector-valued functions, and have been previously used for multi-output regression and for multi-class classification. A less explored facet of the multi-output Gaussian process is that it can be used as a generative model for vector-valued random fields in the context of pattern recognition. As a generative model, the multi-output GP is able to handle vector-valued functions with continuous inputs, as opposed, for example, to hidden Markov models. It also offers the ability to model multivariate random functions with high dimensional inputs. In this report, we use a discriminative training criteria known as Minimum Classification Error to fit the parameters of a multi-output Gaussian process. We compare the performance of generative training and discriminative training of MOGP in emotion recognition, activity recognition, and face recognition. We also compare the proposed methodology against hidden Markov models trained in a generative and in a discriminative way.
A Confident Information First Principle for Parametric Reduction and Model Selection of Boltzmann Machines
Zhao, Xiaozhao, Hou, Yuexian, Song, Dawei, Li, Wenjie
Typical dimensionality reduction (DR) methods are often data-oriented, focusing on directly reducing the number of random variables (features) while retaining the maximal variations in the high-dimensional data. In unsupervised situations, one of the main limitations of these methods lies in their dependency on the scale of data features. This paper aims to address the problem from a new perspective and considers model-oriented dimensionality reduction in parameter spaces of binary multivariate distributions. Specifically, we propose a general parameter reduction criterion, called Confident-Information-First (CIF) principle, to maximally preserve confident parameters and rule out less confident parameters. Formally, the confidence of each parameter can be assessed by its contribution to the expected Fisher information distance within the geometric manifold over the neighbourhood of the underlying real distribution. We then revisit Boltzmann machines (BM) from a model selection perspective and theoretically show that both the fully visible BM (VBM) and the BM with hidden units can be derived from the general binary multivariate distribution using the CIF principle. This can help us uncover and formalize the essential parts of the target density that BM aims to capture and the non-essential parts that BM should discard. Guided by the theoretical analysis, we develop a sample-specific CIF for model selection of BM that is adaptive to the observed samples. The method is studied in a series of density estimation experiments and has been shown effective in terms of the estimate accuracy.
Learning Planar Ising Models
Johnson, Jason K., Oyen, Diane, Chertkov, Michael, Netrapalli, Praneeth
Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a simple greedy algorithm for learning the best planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. We demonstrate our method in simulations and for the application of modeling senate voting records.
Value Iteration with Options and State Aggregation
This paper presents a way of solving Markov Decision Processes that combines state abstraction and temporal abstraction. Specifically, we combine state aggregation with the options framework and demonstrate that they work well together and indeed it is only after one combines the two that the full benefit of each is realized. We introduce a hierarchical value iteration algorithm where we first coarsely solve subgoals and then use these approximate solutions to exactly solve the MDP. This algorithm solved several problems faster than vanilla value iteration.
Submodular relaxation for inference in Markov random fields
The problem of inference in a Markov random field (MRF) arises in many applied domains, e.g. in machine learning, computer vision, natural language processing, etc. In this paper we focus on one important type of inference: maximum a posteriori (MAP) inference, often referred to as MRF energy minimization. Inference of this type is a combinatorial optimization problem, i.e. an optimization problem with the finite domain. The most studied case of MRF energy minimization is the situation when the energy can be represented as a sum of terms (potentials) that depend on only one or two variables each (unary and pairwise potentials). In this setting the energy is said to be defined by a graph where the nodes correspond to the variables and the edges to the pairwise potentials. Minimization of energies defined on graphs in known to be NPhard in general [8] but can be done exactly in polynomial time in a number of special cases, e.g. if the graph defining the energy is acyclic [36] or if the energy is submodular in standard [28] or multi-label sense [10]. One way to go beyond pairwise potentials is to add higher-order summands to the energy. For example, Kohli et al. [23] and Ladický et al. [32] use high-order potentials based on superpixels (image regions) for semantic image segmentation; Delong et al. [11] use label cost potentials for geometric model fitting tasks. To be tractable, high-order potentials need to have a compact representation.
Declarative Statistical Modeling with Datalog
Barany, Vince, Cate, Balder ten, Kimelfeld, Benny, Olteanu, Dan, Vagena, Zografoula
Formalisms for specifying statistical models, such as probabilistic-programming languages, typically consist of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. We propose and investigate a declarative framework for specifying statistical models on top of a database, through an appropriate extension of Datalog. By virtue of extending Datalog, our framework offers a natural integration with the database, and has a robust declarative semantics. Our Datalog extension provides convenient mechanisms to include numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program; these outcomes are minimal solutions with respect to a related program with existentially quantified variables in conclusions. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. We focus on programs that use discrete numerical distributions, but even then the space of possible outcomes may be uncountable (as a solution can be infinite). We define a probability measure over possible outcomes by applying the known concept of cylinder sets to a probabilistic chase procedure. We show that the resulting semantics is robust under different chases. We also identify conditions guaranteeing that all possible outcomes are finite (and then the probability space is discrete). We argue that the framework we propose retains the purely declarative nature of Datalog, and allows for natural specifications of statistical models.
The Learnability of Unknown Quantum Measurements
Cheng, Hao-Chung, Hsieh, Min-Hsiu, Yeh, Ping-Cheng
Quantum machine learning has received significant attention in recent years, and promising progress has been made in the development of quantum algorithms to speed up traditional machine learning tasks. In this work, however, we focus on investigating the information-theoretic upper bounds of sample complexity - how many training samples are sufficient to predict the future behaviour of an unknown target function. This kind of problem is, arguably, one of the most fundamental problems in statistical learning theory and the bounds for practical settings can be completely characterised by a simple measure of complexity. Our main result in the paper is that, for learning an unknown quantum measurement, the upper bound, given by the fat-shattering dimension, is linearly proportional to the dimension of the underlying Hilbert space. Learning an unknown quantum state becomes a dual problem to ours, and as a byproduct, we can recover Aaronson's famous result [Proc. R. Soc. A 463:3089-3144 (2007)] solely using a classical machine learning technique. In addition, other famous complexity measures like covering numbers and Rademacher complexities are derived explicitly. We are able to connect measures of sample complexity with various areas in quantum information science, e.g. quantum state/measurement tomography, quantum state discrimination and quantum random access codes, which may be of independent interest. Lastly, with the assistance of general Bloch-sphere representation, we show that learning quantum measurements/states can be mathematically formulated as a neural network. Consequently, classical ML algorithms can be applied to efficiently accomplish the two quantum learning tasks.
Signal Aggregate Constraints in Additive Factorial HMMs, with Application to Energy Disaggregation
Zhong, Mingjun, Goddard, Nigel, Sutton, Charles
Blind source separation problems are difficult because they are inherently unidentifiable, yet the entire goal is to identify meaningful sources. We introduce a way of incorporating domain knowledge into this problem, called signal aggregate constraints (SACs). SACs encourage the total signal for each of the unknown sources to be close to a specified value. This is based on the observation that the total signal often varies widely across the unknown sources, and we often have a good idea of what total values to expect. We incorporate SACs into an additive factorial hidden Markov model (AFHMM) to formulate the energy disaggregation problems where only one mixture signal is assumed to be observed. A convex quadratic program for approximate inference is employed for recovering those source signals. On a real-world energy disaggregation data set, we show that the use of SACs dramatically improves the original AFHMM, and significantly improves over a recent state-of-the art approach.
RAAM: The Benefits of Robustness in Approximating Aggregated MDPs in Reinforcement Learning
Petrik, Marek, Subramanian, Dharmashankar
We describe how to use robust Markov decision processes for value function approximation with state aggregation. The robustness serves to reduce the sensitivity to the approximation error of sub-optimal policies in comparison to classical methods such as fitted value iteration. This results in reducing the bounds on the gamma-discounted infinite horizon performance loss by a factor of 1/(1-gamma) while preserving polynomial-time computational complexity. Our experimental results show that using the robust representation can significantly improve the solution quality with minimal additional computational cost.
Gaussian Process Volatility Model
Wu, Yue, Hernández-Lobato, José Miguel, Ghahramani, Zoubin
The prediction of time-changing variances is an important task in the modeling of financial data. Standard econometric models are often limited as they assume rigid functional relationships for the evolution of the variance. Moreover, functional parameters are usually learned by maximum likelihood, which can lead to overfitting. To address these problems we introduce GP-Vol, a novel non-parametric model for time-changing variances based on Gaussian Processes. This new model can capture highly flexible functional relationships for the variances. Furthermore, we introduce a new online algorithm for fast inference in GP-Vol. This method is much faster than current offline inference procedures and it avoids overfitting problems by following a fully Bayesian approach. Experiments with financial data show that GP-Vol performs significantly better than current standard alternatives.