A key problem in climate science is how to combine the predictions of the multi-model ensemble of global climate models. Recent work in machine learning (Monteleoni et al. 2011) showed the promise of an algorithm for online learning with experts for this task.We extend the Tracking Climate Models (TCM) approach to (1) take into account climate model predictions at higher spatial resolutions and (2) to model geospatial neighborhood influence between regions. Our algorithm enables neighborhood influence by modifying the transition dynamics of the Hidden Markov Model used by TCM, allowing the performance of spatial neighbors to influence the temporal switching probabilities for the best expert (climate model) at a given location. In experiments on historical data at a variety of spatial resolutions, our algorithm demonstrates improvements over TCM, when tracking global temperature anomalies.
We consider Bayesian analysis of a class of multiple changepoint models. While there are a variety of efficient ways to analyse these models if the parameters associated with each segment are independent, there are few general approaches for models where the parameters are dependent. Under the assumption that the dependence is Markov, we propose an efficient online algorithm for sampling from an approximation to the posterior distribution of the number and position of the changepoints. In a simulation study, we show that the approximation introduced is negligible. We illustrate the power of our approach through fitting piecewise polynomial models to data, under a model which allows for either continuity or discontinuity of the underlying curve at each changepoint. This method is competitive with, or out-performs, other methods for inferring curves from noisy data; and uniquely it allows for inference of the locations of discontinuities in the underlying curve.
A fundamental objective in reinforcement learning is the maintenance of a proper balance between exploration and exploitation. This problem becomes more challenging when the agent can only partially observe the states of its environment. In this paper we propose a dual-policy method for jointly learning the agent behavior and the balance between exploration exploitation, in partially observable environments. The method subsumes traditional exploration, in which the agent takes actions to gather information about the environment, and active learning, in which the agent queries an oracle for optimal actions (with an associated cost for employing the oracle). The form of the employed exploration is dictated by the specific problem. Theoretical guarantees are provided concerning the optimality of the balancing of exploration and exploitation. The effectiveness of the method is demonstrated by experimental results on benchmark problems.
Bai, Aijun (University of Science and Technology of China) | Wu, Feng (University of Southampton) | Zhang, Zongzhang (National University of Singapore) | Chen, Xiaoping (University of Science and Technology of China)
Monte-Carlo tree search (MCTS) has been drawing great interest in recent years for planning under uncertainty. One of the key challenges is the trade-off between exploration and exploitation. To address this, we introduce a novel online planning algorithm for large POMDPs using Thompson sampling based MCTS that balances between cumulative and simple regrets. The proposed algorithm Dirichlet-Dirichlet-NormalGamma based Partially Observable Monte-Carlo Planning (D 2 NG-POMCP) treats the accumulated reward of performing an action from a belief state in the MCTS search tree as a random variable following an unknown distribution with hidden parameters. Bayesian method is used to model and infer the posterior distribution of these parameters by choosing the conjugate prior in the form of a combination of two Dirichlet and one NormalGamma distributions. Thompson sampling is exploited to guide the action selection in the search tree. Experimental results confirmed that our algorithm outperforms the state-of-the-art approaches on several common benchmark problems.
Scoring structures of undirected graphical models by means of evaluating the marginal likelihood is very hard. The main reason is the presence of the partition functionwhich is intractable to evaluate, let alone integrate over. We propose to approximate the marginal likelihood by employing two levels of approximation: we assume normality of the posterior (the Laplace approximation) and approximate allremaining intractable quantities using belief propagation and the linear response approximation.