Undirected Networks
Bayesian Policy Search with Policy Priors
Wingate, David (Massachusetts Institute of Technology) | Goodman, Noah D. (Stanford University) | Roy, Daniel M. (Massachusetts Institute of Technology) | Kaelbling, Leslie P. (Massachusetts Institute of Technology) | Tenenbaum, Joshua B. (Massachusetts Institute of Technology)
We consider the problem of learning to act in partially observable, continuous-state-and-action worlds where we have abstract prior knowledge about the structure of the optimal policy in the form of a distribution over policies. Using ideas from planning-as-inference reductions and Bayesian unsupervised learning, we cast Markov Chain Monte Carlo as a stochastic, hill-climbing policy search algorithm. Importantly, this algorithm's search bias is directly tied to the prior and its MCMC proposal kernels, which means we can draw on the full Bayesian toolbox to express the search bias, including nonparametric priors and structured, recursive processes like grammars over action sequences. Furthermore, we can reason about uncertainty in the search bias itself by constructing a hierarchical prior and reasoning about latent variables that determine the abstract structure of the policy. This yields an adaptive search algorithm---our algorithm learns to learn a structured policy efficiently. We show how inference over the latent variables in these policy priors enables intra- and intertask transfer of abstract knowledge. We demonstrate the flexibility of this approach by learning meta search biases, by constructing a nonparametric finite state controller to model memory, by discovering motor primitives using a simple grammar over primitive actions, and by combining all three.
Learning Driving Behavior by Timed Syntactic Pattern Recognition
Verwer, Sicco (Katholieke Universiteit Leuven) | Weerdt, Mathijs de (Delft University of Technology) | Witteveen, Cees (Delft University of Technology)
The data at our disposal consists of onboard sensor measurements that have been collected from truck round-trips. We advocate the use of an explicit time representation By applying a simple discretization method, we obtain sequences in syntactic pattern recognition because it can of timed events. The behavior that is displayed in result in more succinct models and easier learning these sequences is unknown. From this data, we want to learn problems. We apply this approach to the real-world a model that we can use to monitor the driving behavior in problem of learning models for the driving behavior new data, i.e., to use it as a classifier. Our approach is to first of truck drivers. We discretize the values of learn a timed model from the unlabeled sequences using the onboard sensors into simple events.
Strategy Learning for Autonomous Agents in Smart Grid Markets
Reddy, Prashant P. (Carnegie Mellon University) | Veloso, Manuela M. (Carnegie Mellon University)
Distributed electricity producers, such as small wind farms and solar installations, pose several technical and economic challenges in Smart Grid design. One approach to addressing these challenges is through Broker Agents who buy electricity from distributed producers, and also sell electricity to consumers, via a Tariff Market--a new market mechanism where Broker Agents publish concurrent bid and ask prices. We investigate the learning of pricing strategies for an autonomous Broker Agent to profitably participate in a Tariff Market. We employ Markov Decision Processes (MDPs) and reinforcement learning. An important concern with this method is that even simple representations of the problem domain result in very large numbers of states in the MDP formulation because market prices can take nearly arbitrary real values. In this paper, we present the use of derived state space features, computed using statistics on Tariff Market prices and Broker Agent customer portfolios, to obtain a scalable state representation. We also contribute a set of pricing tactics that form building blocks in the learned Broker Agent strategy. We further present a Tariff Market simulation model based on real-world data and anticipated market dynamics. We use this model to obtain experimental results that show the learned strategy performing vastly better than a random strategy and significantly better than two other non-learning strategies.
Agent-Oriented Incremental Team and Activity Recognition
Masato, Daniele (University of Aberdeen) | Norman, Timothy J. (University of Aberdeen) | Vasconcelos, Wamberto W. (University of Aberdeen) | Sycara, Katia (Carnegie Mellon University)
Monitoring team activity is beneficial when human teams cooperate in the enactment of a joint plan. Monitoring allows teams to maintain awareness of each other's progress within the plan and it enables anticipation of information needs. Humans find this difficult, particularly in time-stressed and uncertain environments. In this paper we introduce a probabilistic model, based on Conditional Random Fields, to automatically recognise the composition of teams and the team activities in relation to a plan. The team composition and activities are recognised incrementally by interpreting a stream of spatio-temporal observations.
Generative Structure Learning for Markov Logic Networks Based on Graph of Predicates
Dinh, Quang-Thang (Universite d'Orleans) | Exbrayat, Matthieu (Universite d'Orleans) | Vrain, Christel (Universite d'Orleans)
In this paper we present a new algorithm for generatively learning the structure of Markov Logic Networks. This algorithm relies on a graph of predicates, which summarizes the links existing between predicates and on relational information between ground atoms in the training database. Candidate clauses are produced by means of a heuristical variabilization technique. According to our first experiments, this approach appears to be promising.
A Hidden Markov Model Variant for Sequence Classification
Blasiak, Sam (George Mason University) | Rangwala, Huzefa (George Mason University)
Sequence classification is central to many practical problems within machine learning. Distances metrics between arbitrary pairs of sequences can be hard to define because sequences can vary in length and the information contained in the order of sequence elements is lost when standard metrics such as Euclidean distance are applied. We present a scheme that employs a Hidden Markov Model variant to produce a set of fixed-length description vectors from a set of sequences. We then define three inference algorithms, a Baum-Welch variant, a Gibbs Sampling algorithm, and a variational algorithm, to infer model parameters. Finally, we show experimentally that the fixed length representation produced by these inference methods is useful for classifying sequences of amino acids into structural classes
Multi-Evidence Lifted Message Passing, with Application to PageRank and the Kalman Filter
Ahmadi, Babak (Fraunhofer IAIS) | Kersting, Kristian (Fraunhofer IAIS and University of Bonn) | Sanner, Scott (NICTA and Australian National University)
Lifted message passing algorithms exploit repeated structure within a given graphical model to answer queries efficiently. Given evidence, they construct a lifted network of supernodes and superpotentials corresponding to sets of nodes and potentials that are indistinguishable given the evidence. Recently, efficient algorithms were presented for updating the structure of an existing lifted network with incremental changes to the evidence. In the inference stage, however, current algorithms need to construct a separate lifted network for each evidence case and run a modified message passing algorithm on each lifted network separately. Consequently, symmetries across the inference tasks are not exploited. In this paper, we present a novel lifted message passing technique that exploits symmetries across multiple evidence cases. The benefits of this multi-evidence lifted inference are shown for several important AI tasks such as computing personalized PageRanks and Kalman filters via multi-evidence lifted Gaussian belief propagation.
Finite-Length Markov Processes with Constraints
Pachet, Francois (SONY CSL-Paris) | Roy, Pierre (SONY CSL-Paris) | Barbieri, Gabriele (SONY CSL-Paris)
Many systems use Markov models to generate finite-length sequences that imitate a given style. These systems often need to enforce specific control constraints on the sequences to generate. Unfortunately, control constraints are not compatible with Markov models, as they induce long-range dependencies that violate the Markov hypothesis of limited memory. Attempts to solve this issue using heuristic search do not give any guarantee on the nature and probability of the sequences generated. We propose a novel and efficient approach to controlled Markov generation for a specific class of control constraints that 1) guarantees that generated sequences satisfy control constraints and 2) follow the statistical distribution of the initial Markov model. Revisiting Markov generation in the framework of constraint satisfaction, we show how constraints can be compiled into a non-homogeneous Markov model, using arc-consistency techniques and renormalization. We illustrate the approach on a melody generation problem and sketch some realtime applications in which control constraints are given by gesture controllers.
Efficient Planning for Factored Infinite-Horizon DEC-POMDPs
Pajarinen, Joni Kristian (Aalto University) | Peltonen, Jaakko Tapani (Aalto University)
Decentralized partially observable Markov decision processes (DEC-POMDPs) are used to plan policies for multiple agents that must maximize a joint reward function but do not communicate with each other. The agents act under uncertainty about each other and the environment. This planning task arises in optimization of wireless networks, and other scenarios where communication between agents is restricted by costs or physical limits. DEC-POMDPs are a promising solution, but optimizing policies quickly becomes computationally intractable when problem size grows. Factored DEC-POMDPs allow large problems to be described in compact form, but have the same worst case complexity as non-factored DEC-POMDPs. We propose an efficient optimization algorithm for large factored infinite-horizon DEC-POMDPs. We formulate expectation-maximization based optimization into a new form, where complexity can be kept tractable by factored approximations. Our method performs well, and it can solve problems with more agents and larger state spaces than state of the art DEC-POMDP methods. We give results for factored infinite-horizon DEC-POMDP problems with up to 10 agents.
Timing Tweets to Increase Effectiveness of Information Campaigns
Dabeer, Onkar (Tata Institute of Fundamental Research) | Mehendale, Prachi (Tata Institute of Fundamental Research) | Karnik, Aditya (India Science Lab) | Saroop, Atul (India Science Lab)
Microblogging websites such as Twitter are increasingly being used by businesses/campaigners for timely dissemination of information to their followers. The diffusion of a tweet depends on several factors: the activity of the follower nodes, the responsiveness of follower nodes to tweets from the source node, the out-degree of the follower nodes, the content of recent related tweets seen by the follower node, etc. Using such factors, in this paper, we propose a framework to measure the effectiveness of an information campaign over Twitter. We consider a positive as well as a negative metric to measure the impact of a tweet: while retweets are used to measure the positive impact, the lack of a timely response from an active follower node is taken as a potential negative impact. We investigate the scheduling of tweets to increase the net positive impact while keeping the net negative impact below a desired level. We propose and study several scheduling algorithms by casting the problem in a Markov Decision Process (MDP) framework. In order to compare our algorithms, we estimate the model parameters from tweet data collected using the Twitter API from an arbitrarily selected node and its 6837 followers over several months. For this dataset, we find that if successive tweets in the campaign are novel, then substantial gains over user activity based scheduling can be obtained by scheduling tweets in time slots where the ratio of the expected positive and negative metrics is high. We call this the MaxRatio policy and we show that it is optimal under certain conditions. In cases where we are not certain about the response of users to successive related tweets, we identify another algorithm (which we call MaxReach) as a robust alternative.