Undirected Networks
Auto-Regressive HMM Inference with Incomplete Data for Short-Horizon Wind Forecasting
Barber, Chris, Bockhorst, Joseph, Roebber, Paul
Accurate short-term wind forecasts (STWFs), with time horizons from 0.5 to 6 hours, are essential for efficient integration of wind power to the electrical power grid. Physical models based on numerical weather predictions are currently not competitive, and research on machine learning approaches is ongoing. Two major challenges confronting these efforts are missing observations and weather-regime induced dependency shifts among wind variables at geographically distributed sites. In this paper we introduce approaches that address both of these challenges. We describe a new regime-aware approach to STWF that use auto-regressive hidden Markov models (AR-HMM), a subclass of conditional linear Gaussian (CLG) models.
A POMDP Extension with Belief-dependent Rewards
Araya, Mauricio, Buffet, Olivier, Thomas, Vincent, Charpillet, Franรงcois
Unfortunately, some problems cannot be modeled with state-dependent reward functions, e.g., problems whose objective explicitly implies reducing the uncertainty on the state. To that end, we introduce rho-POMDPs, an extension of POMDPs where the reward function rho depends on the belief state. We show that, under the common assumption that rho is convex, the value function is also convex, what makes it possible to (1) approximate rho arbitrarily well with a piecewise linear and convex (PWLC) function, and (2) use state-of-the-art exact or approximate solving algorithms with limited changes. Papers published at the Neural Information Processing Systems Conference.
Cardinality Restricted Boltzmann Machines
Swersky, Kevin, Sutskever, Ilya, Tarlow, Daniel, Zemel, Richard S., Salakhutdinov, Russ R., Adams, Ryan P.
The Restricted Boltzmann Machine (RBM) is a popular density model that is also good for extracting features. A main source of tractability in RBM models is the model's assumption that given an input, hidden units activate independently from one another. Sparsity and competition in the hidden representation is believed to be beneficial, and while an RBM with competition among its hidden units would acquire some of the attractive properties of sparse coding, such constraints are not added due to the widespread belief that the resulting model would become intractable. In this work, we show how a dynamic programming algorithm developed in 1981 can be used to implement exact sparsity in the RBM's hidden units. We then expand on this and show how to pass derivatives through a layer of exact sparsity, which makes it possible to fine-tune a deep belief network (DBN) consisting of RBMs with sparse hidden layers.
Accelerated Adaptive Markov Chain for Partition Function Computation
Ermon, Stefano, Gomes, Carla P., Sabharwal, Ashish, Selman, Bart
We propose a novel Adaptive Markov Chain Monte Carlo algorithm to compute the partition function. In particular, we show how to accelerate a flat histogram sampling technique by significantly reducing the number of null moves'' in the chain, while maintaining asymptotic convergence properties. Our experiments show that our method converges quickly to highly accurate solutions on a range of benchmark instances, outperforming other state-of-the-art methods such as IJGP, TRW, and Gibbs sampling both in run-time and accuracy. We also show how obtaining a so-called density of states distribution allows for efficient weight learning in Markov Logic theories. Papers published at the Neural Information Processing Systems Conference.
A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes
Furmston, Thomas, Barber, David
Parametric policy search algorithms are one of the methods of choice for the optimisation of Markov Decision Processes, with Expectation Maximisation and natural gradient ascent being considered the current state of the art in the field. In this article we provide a unifying perspective of these two algorithms by showing that their step-directions in the parameter space are closely related to the search direction of an approximate Newton method. This analysis leads naturally to the consideration of this approximate Newton method as an alternative gradient-based method for Markov Decision Processes. We are able show that the algorithm has numerous desirable properties, absent in the naive application of Newton's method, that make it a viable alternative to either Expectation Maximisation or natural gradient ascent. Empirical results suggest that the algorithm has excellent convergence and robustness properties, performing strongly in comparison to both Expectation Maximisation and natural gradient ascent.
Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning
Pajarinen, Joni K., Peltonen, Jaakko
Applications such as robot control and wireless communication require planning under uncertainty. Partially observable Markov decision processes (POMDPs) plan policies for single agents under uncertainty and their decentralized versions (DEC-POMDPs) find a policy for multiple agents. The policy in infinite-horizon POMDP and DEC-POMDP problems has been represented as finite state controllers (FSCs). We introduce a novel class of periodic FSCs, composed of layers connected only to the previous and next layer. Our periodic FSC method finds a deterministic finite-horizon policy and converts it to an initial periodic infinite-horizon policy.
Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization
Bach, Stephen, Broecheler, Matthias, Getoor, Lise, O', leary, Dianne
Probabilistic graphical models are powerful tools for analyzing constrained, continuous domains. However, finding most-probable explanations (MPEs) in these models can be computationally expensive. In this paper, we improve the scalability of MPE inference in a class of graphical models with piecewise-linear and piecewise-quadratic dependencies and linear constraints over continuous domains. We derive algorithms based on a consensus-optimization framework and demonstrate their superior performance over state of the art. We show empirically that in a large-scale voter-preference modeling problem our algorithms scale linearly in the number of dependencies and constraints. Papers published at the Neural Information Processing Systems Conference.
Optimization-Based MCMC Methods for Nonlinear Hierarchical Statistical Inverse Problems
Bardsley, Johnathan, Cui, Tiangang
In many hierarchical inverse problems, not only do we want to estimate high- or infinite-dimensional model parameters in the parameter-to-observable maps, but we also have to estimate hyperparameters that represent critical assumptions in the statistical and mathematical modeling processes. As a joint effect of high-dimensionality, nonlinear dependence, and non-concave structures in the joint posterior posterior distribution over model parameters and hyperparameters, solving inverse problems in the hierarchical Bayesian setting poses a significant computational challenge. In this work, we aim to develop scalable optimization-based Markov chain Monte Carlo (MCMC) methods for solving hierarchical Bayesian inverse problems with nonlinear parameter-to-observable maps and a broader class of hyperparameters. Our algorithmic development is based on the recently developed scalable randomize-then-optimize (RTO) method [4] for exploring the high- or infinite-dimensional model parameter space. By using RTO either as a proposal distribution in a Metropolis-within-Gibbs update or as a biasing distribution in the pseudo-marginal MCMC [2], we are able to design efficient sampling tools for hierarchical Bayesian inversion. In particular, the integration of RTO and the pseudo-marginal MCMC has sampling performance robust to model parameter dimensions. We also extend our methods to nonlinear inverse problems with Poisson-distributed measurements. Numerical examples in PDE-constrained inverse problems and positron emission tomography (PET) are used to demonstrate the performance of our methods.
Deep Learning for Asset Bubbles Detection
Bashchenko, Oksana, Marchal, Alexis
We develop a methodology for detecting asset bubbles using a neural network. We rely on the theory of local martingales in continuous-time and use a deep network to estimate the diffusion coefficient of the price process more accurately than the current estimator, obtaining an improved detection of bubbles. We show the outperformance of our algorithm over the existing statistical method in a laboratory created with simulated data. We then apply the network classification to real data and build a zero net exposure trading strategy that exploits the risky arbitrage emanating from the presence of bubbles in the US equity market from 2006 to 2008. The profitability of the strategy provides an estimation of the economical magnitude of bubbles as well as support for the theoretical assumptions relied on.
Maxmin Q-learning: Controlling the Estimation Bias of Q-learning
Lan, Qingfeng, Pan, Yangchen, Fyshe, Alona, White, Martha
Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value. Algorithms have been proposed to reduce overestimation bias, but we lack an understanding of how bias interacts with performance, and the extent to which existing algorithms mitigate bias. In this paper, we 1) highlight that the effect of overestimation bias on learning efficiency is environment-dependent; 2) propose a generalization of Q-learning, called \emph{Maxmin Q-learning}, which provides a parameter to flexibly control bias; 3) show theoretically that there exists a parameter choice for Maxmin Q-learning that leads to unbiased estimation with a lower approximation variance than Q-learning; and 4) prove the convergence of our algorithm in the tabular case, as well as convergence of several previous Q-learning variants, using a novel Generalized Q-learning framework. We empirically verify that our algorithm better controls estimation bias in toy environments, and that it achieves superior performance on several benchmark problems.