Genre
On SAT representations of XOR constraints
Gwynne, Matthew, Kullmann, Oliver
We study the representation of systems S of linear equations over the two-element field (aka xor- or parity-constraints) via conjunctive normal forms F (boolean clause-sets). First we consider the problem of finding an "arc-consistent" representation ("AC"), meaning that unit-clause propagation will fix all forced assignments for all possible instantiations of the xor-variables. Our main negative result is that there is no polysize AC-representation in general. On the positive side we show that finding such an AC-representation is fixed-parameter tractable (fpt) in the number of equations. Then we turn to a stronger criterion of representation, namely propagation completeness ("PC") --- while AC only covers the variables of S, now all the variables in F (the variables in S plus auxiliary variables) are considered for PC. We show that the standard translation actually yields a PC representation for one equation, but fails so for two equations (in fact arbitrarily badly). We show that with a more intelligent translation we can also easily compute a translation to PC for two equations. We conjecture that computing a representation in PC is fpt in the number of equations.
Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search
Guez, Arthur, Silver, David, Dayan, Peter
Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, finding the resulting Bayes-optimal policies is notoriously taxing, since the search space becomes enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach outperformed prior Bayesian model-based RL algorithms by a significant margin on several well-known benchmark problems -- because it avoids expensive applications of Bayes rule within the search tree by lazily sampling models from the current beliefs. We illustrate the advantages of our approach by showing it working in an infinite state space domain which is qualitatively out of reach of almost all previous work in Bayesian exploration.
Contextually Supervised Source Separation with Application to Energy Disaggregation
We propose a new framework for single-channel source separation that lies between the fully supervised and unsupervised setting. Instead of supervision, we provide input features for each source signal and use convex methods to estimate the correlations between these features and the unobserved signal decomposition. We analyze the case of $\ell_2$ loss theoretically and show that recovery of the signal components depends only on cross-correlation between features for different signals, not on correlations between features for the same signal. Contextually supervised source separation is a natural fit for domains with large amounts of data but no explicit supervision; our motivating application is energy disaggregation of hourly smart meter data (the separation of whole-home power signals into different energy uses). Here we apply contextual supervision to disaggregate the energy usage of thousands homes over four years, a significantly larger scale than previously published efforts, and demonstrate on synthetic data that our method outperforms the unsupervised approach.
Recursive Compressed Sensing
Freris, Nikolaos M., Öçal, Orhan, Vetterli, Martin
We introduce a recursive algorithm for performing compressed sensing on streaming data. The approach consists of a) recursive encoding, where we sample the input stream via overlapping windowing and make use of the previous measurement in obtaining the next one, and b) recursive decoding, where the signal estimate from the previous window is utilized in order to achieve faster convergence in an iterative optimization scheme applied to decode the new one. To remove estimation bias, a two-step estimation procedure is proposed comprising support set detection and signal amplitude estimation. Estimation accuracy is enhanced by a non-linear voting method and averaging estimates over multiple windows. We analyze the computational complexity and estimation error, and show that the normalized error variance asymptotically goes to zero for sublinear sparsity. Our simulation results show speed up of an order of magnitude over traditional CS, while obtaining significantly lower reconstruction error under mild conditions on the signal magnitudes and the noise level.
Identification of Gaussian Process State-Space Models with Particle Stochastic Approximation EM
Frigola, Roger, Lindsten, Fredrik, Schön, Thomas B., Rasmussen, Carl E.
Dept. of Information Technology, Uppsala University, Sweden.Abstract: Gaussian process state-space models (GP-SSMs) are a very flexible family of models of nonlinear dynamical systems. They comprise a Bayesian nonparametric representation of the dynamics of the system and additional (hyper-)parameters governing the properties of this nonparametric representation. The Bayesian formalism enables systematic reasoning about the uncertainty in the system dynamics. We present an approach to maximum likelihood identification of the parameters in GP-SSMs, while retaining the full nonparametric description of the dynamics. The method is based on a stochastic approximation version of the EM algorithm that employs recent developments in particle Markov chain Monte Carlo for efficient identification. INTRODUCTION Inspired by recent developments in robotics and machine learning, we aim at constructing models of nonlinear dynamical systems capable of quantifying the uncertainty in their predictions.
The Bernstein Function: A Unifying Framework of Nonconvex Penalization in Sparse Estimation
In this paper we study nonconvex penalization using Bernstein functions. Since the Bernstein function is concave and nonsmooth at the origin, it can induce a class of nonconvex functions for high-dimensional sparse estimation problems. We derive a threshold function based on the Bernstein penalty and give its mathematical properties in sparsity modeling. We show that a coordinate descent algorithm is especially appropriate for penalized regression problems with the Bernstein penalty. Additionally, we prove that the Bernstein function can be defined as the concave conjugate of a $\varphi$-divergence and develop a conjugate maximization algorithm for finding the sparse solution. Finally, we particularly exemplify a family of Bernstein nonconvex penalties based on a generalized Gamma measure and conduct empirical analysis for this family.
The Matrix Ridge Approximation: Algorithms and Applications
We are concerned with an approximation problem for a symmetric positive semidefinite matrix due to motivation from a class of nonlinear machine learning methods. We discuss an approximation approach that we call {matrix ridge approximation}. In particular, we define the matrix ridge approximation as an incomplete matrix factorization plus a ridge term. Moreover, we present probabilistic interpretations using a normal latent variable model and a Wishart model for this approximation approach. The idea behind the latent variable model in turn leads us to an efficient EM iterative method for handling the matrix ridge approximation problem. Finally, we illustrate the applications of the approximation approach in multivariate data analysis. Empirical studies in spectral clustering and Gaussian process regression show that the matrix ridge approximation with the EM iteration is potentially useful.
Markov Network Structure Learning via Ensemble-of-Forests Models
Arvaniti, Eirini, Claassen, Manfred
Real world systems typically feature a variety of different dependency types and topologies that complicate model selection for probabilistic graphical models. We introduce the ensemble-of-forests model, a generalization of the ensemble-of-trees model. Our model enables structure learning of Markov random fields (MRF) with multiple connected components and arbitrary potentials. We present two approximate inference techniques for this model and demonstrate their performance on synthetic data. Our results suggest that the ensemble-of-forests approach can accurately recover sparse, possibly disconnected MRF topologies, even in presence of non-Gaussian dependencies and/or low sample size. We applied the ensemble-of-forests model to learn the structure of perturbed signaling networks of immune cells and found that these frequently exhibit non-Gaussian dependencies with disconnected MRF topologies. In summary, we expect that the ensemble-of-forests model will enable MRF structure learning in other high dimensional real world settings that are governed by non-trivial dependencies.
Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
Frigola, Roger, Lindsten, Fredrik, Schön, Thomas B., Rasmussen, Carl E.
State-space models are successfully used in many areas of science, engineering and economics to model time series and dynamical systems. We present a fully Bayesian approach to inference \emph{and learning} (i.e. state estimation and system identification) in nonlinear nonparametric state-space models. We place a Gaussian process prior over the state transition dynamics, resulting in a flexible model able to capture complex dynamical phenomena. To enable efficient inference, we marginalize over the transition dynamics function and infer directly the joint smoothing distribution using specially tailored Particle Markov Chain Monte Carlo samplers. Once a sample from the smoothing distribution is computed, the state transition predictive distribution can be formulated analytically. Our approach preserves the full nonparametric expressivity of the model and can make use of sparse Gaussian processes to greatly reduce computational complexity.
Stable mixed graphs
In this paper, we study classes of graphs with three types of edges that capture the modified independence structure of a directed acyclic graph (DAG) after marginalisation over unobserved variables and conditioning on selection variables using the $m$-separation criterion. These include MC, summary, and ancestral graphs. As a modification of MC graphs, we define the class of ribbonless graphs (RGs) that permits the use of the $m$-separation criterion. RGs contain summary and ancestral graphs as subclasses, and each RG can be generated by a DAG after marginalisation and conditioning. We derive simple algorithms to generate RGs, from given DAGs or RGs, and also to generate summary and ancestral graphs in a simple way by further extension of the RG-generating algorithm. This enables us to develop a parallel theory on these three classes and to study the relationships between them as well as the use of each class.