Goto

Collaborating Authors

 Markov Models


Tree-Based Solution Methods for Multiagent POMDPs with Delayed Communication

AAAI Conferences

Multiagent Partially Observable Markov Decision Processes (MPOMDPs) provide a powerful framework for optimal decision making under the assumption of instantaneous communication. We focus on a delayed communication setting (MPOMDP-DC), in which broadcasted information is delayed by at most one time step. This model allows agents to act on their most recent (private) observation. Such an assumption is a strict generalization over having agents wait until the global information is available and is more appropriate for applications in which response time is critical. In this setting, however, value function backups are significantly more costly, and naive application of incremental pruning, the core of many state-of-the-art optimal POMDP techniques, is intractable. In this paper, we overcome this problem by demonstrating that computation of the MPOMDP-DC backup can be structured as a tree and introducing two novel tree-based pruning techniques that exploit this structure in an effective way. We experimentally show that these methods have the potential to outperform naive incremental pruning by orders of magnitude, allowing for the solution of larger problems.


Bayes-Adaptive Interactive POMDPs

AAAI Conferences

We introduce the Bayes-Adaptive Interactive Partially Observable Markov Decision Process (BA-IPOMDP), the first multiagent decision model that explicitly incorporates model learning. As in I-POMDPs, the BA-IPOMDP agent maintains beliefs over interactive states, which include the physical states as well as the other agentsโ€™ models. The BA-IPOMDP assumes that the state transition and observation probabilities are unknown, and augments the interactive states to include these parameters. Beliefs are maintained over this augmented interactive state space. This (necessary) state expansion exacerbates the curse of dimensionality, especially since each I-POMDP belief update is already a recursive procedure (because an agent invokes belief updates from other agentsโ€™ perspectives as part of its own belief update, in order to anticipate other agentsโ€™ actions). We extend the interactive particle filter to perform approximate belief update on BA-IPOMDPs. We present our findings on the multiagent Tiger problem.


Competing with Humans at Fantasy Football: Team Formation in Large Partially-Observable Domains

AAAI Conferences

We present the first real-world benchmark for sequentially-optimal team formation, working within the framework of a class of online football prediction games known as Fantasy Football. We model the problem as a Bayesian reinforcement learning one, where the action space is exponential in the number of players and where the decision maker's beliefs are over multiple characteristics of each footballer. We then exploit domain knowledge to construct computationally tractable solution techniques in order to build a competitive automated Fantasy Football manager. Thus, we are able to establish the baseline performance in this domain, even without complete information on footballers' performances (accessible to human managers), showing that our agent is able to rank at around the top percentile when pitched against 2.5M human players.


Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs

AAAI Conferences

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. We propose a distributed reinforcement learning approach, where agents take turns to learn best responses to each otherโ€™s policies. This promotes decentralization of the policy computation problem, and relaxes reliance on the full knowledge of the problem parameters. We derive the relation between the sample complexity of best response learning and error tolerance. Our key contribution is to show that sample complexity could grow exponentially with the problem horizon. We show empirically that even if the sample requirement is set lower than what theory demands, our learning approach can produce (near) optimal policies in some benchmark Dec-POMDP problems.


Markov Network Structure Learning: A Randomized Feature Generation Approach

AAAI Conferences

The structure of a Markov network is typically learned in one of two ways. The first approach is to treat this task as a global search problem. However, these algorithms are slow as they require running the expensive operation of weight (i.e., parameter) learning many times. The second approach involves learning a set of local models and then combining them into a global model. However, it can be computationally expensive to learn the local models for datasets that contain a large number of variables and/or examples. This paper pursues a third approach that views Markov network structure learning as a feature generation problem. The algorithm combines a data-driven, specific-to-general search strategy with randomization to quickly generate a large set of candidate features that all have support in the data. It uses weight learning, with L1 regularization, to select a subset of generated features to include in the model. On a large empirical study, we find that our algorithm is equivalently accurate to other state-of-the-art methods while exhibiting a much faster run time.


Sequence Labeling with Non-Negative Weighted Higher Order Features

AAAI Conferences

In sequence labeling, using higher order features leads to high inference complexity. A lot of studies have been conducted to address this problem. In this paper, we propose a new exact decoding algorithm under the assumption that weights of all higher order features are non-negative. In the worst case, the time complexity of our algorithm is quadratic on the number of higher order features. Comparing with existing algorithms, our method is more efficient and easier to implement. We evaluate our method on two sequence labeling tasks: Optical Character Recognition and Chinese part-of-speech tagging. Our experimental results demonstrate that adding higher order features significantly improves the performance while requiring only 30% additional inference time.


Context Tree Maximizing

AAAI Conferences

Recent developments in reinforcement learning for non-Markovianproblems witness a surge in history-based methods, among which weare particularly interested in two frameworks, PhiMDP and MC-AIXI-CTW. PhiMDP attempts to reduce the general RL problem, where the environment's states and dynamics are both unknown, toan MDP, while MC-AIXI-CTW incrementally learns a mixture of contexttrees as its environment model. The main idea of PhiMDP is toconnect generic reinforcement learning with classical reinforcementlearning. The first implementation of PhiMDP relies on astochastic search procedure for finding a tree that minimizes acertain cost function. This does not guarantee finding theminimizing tree, or even a good one, given limited search time. As aconsequence it appears that the approach has difficulties with largedomains. MC-AIXI-CTW is attractive in that it can incrementally andanalytically compute the internal model through interactions withthe environment. Unfortunately, it is computationally demanding dueto requiring heavy planning simulations at every single time step.We devise a novel approach called CTMRL, which analytically andefficiently finds the cost-minimizing tree. Instead of thecontext-tree weighting method that MC-AIXI-CTW is based on, we usethe closely related context-tree maximizing algorithm that selectsjust one single tree. This approach falls under the PhiMDPframework, which allows the replacement of the costly planningcomponent of MC-AIXI-CTW with simple Q-Learning. Our empiricalinvestigation show that CTMRL finds policies of quality as good as MC-AIXI-CTW's on sixdomains including a challenging Pacman domain, but in an order ofmagnitude less time.


Kernel-Based Reinforcement Learning on Representative States

AAAI Conferences

Markov decision processes (MDPs) are an established framework for solving sequential decision-making problems under uncertainty. In this work, we propose a new method for batch-mode reinforcement learning (RL) with continuous state variables. The method is an approximation to kernel-based RL on a set of k representative states. Similarly to kernel-based RL, our solution is a fixed point of a kernelized Bellman operator and can approximate the optimal solution to an arbitrary level of granularity. Unlike kernel-based RL, our method is fast. In particular, our policies can be computed in O ( n ) time, where n is the number of training examples. The time complexity of kernel-based RL is ฮฉ( n 2 ). We introduce our method, analyze its convergence, and compare it to existing work. The method is evaluated on two existing control problems with 2 to 4 continuous variables and a new problem with 64 variables. In all cases, we outperform state-of-the-art results and offer simpler solutions.


Non-Intrusive Load Monitoring Using Prior Models of General Appliance Types

AAAI Conferences

Non-intrusive appliance load monitoring is the process of disaggregating a household's total electricity consumption into its contributing appliances. In this paper we propose an approach by which individual appliances can be iteratively separated from an aggregate load. Unlike existing approaches, our approach does not require training data to be collected by sub-metering individual appliances, nor does it assume complete knowledge of the appliances present in the household. Instead, we propose an approach in which prior models of general appliance types are tuned to specific appliance instances using only signatures extracted from the aggregate load. The tuned appliance models are then used to estimate each appliance's load, which is subsequently subtracted from the aggregate load. This process is applied iteratively until all appliances for which prior behaviour models are known have been disaggregated. We evaluate the accuracy of our approach using the REDD data set, and show the disaggregation performance when using our training approach is comparable to when sub-metered training data is used. We also present a deployment of our system as a live application and demonstrate the potential for personalised energy saving feedback.


Global Climate Model Tracking Using Geospatial Neighborhoods

AAAI Conferences

A key problem in climate science is how to combine the predictions of the multi-model ensemble of global climate models. Recent work in machine learning (Monteleoni et al. 2011) showed the promise of an algorithm for online learning with experts for this task.We extend the Tracking Climate Models (TCM) approach to (1) take into account climate model predictions at higher spatial resolutions and (2) to model geospatial neighborhood influence between regions. Our algorithm enables neighborhood influence by modifying the transition dynamics of the Hidden Markov Model used by TCM, allowing the performance of spatial neighbors to influence the temporal switching probabilities for the best expert (climate model) at a given location. In experiments on historical data at a variety of spatial resolutions, our algorithm demonstrates improvements over TCM, when tracking global temperature anomalies.