Undirected Networks
Global Distant Supervision for Relation Extraction
Han, Xianpei (Institute of Software, Chinese Academy of Sciences) | Sun, Le (Institute of Software, Chinese Academy of Sciences)
Machine learning approaches to relation extraction are typically supervised and require expensive labeled data. To break the bottleneck of labeled data, a promising approach is to exploit easily obtained indirect supervision knowledge – which we usually refer to as distant supervision (DS). However, traditional DS methods mostly only exploit one specific kind of indirect supervision knowledge – the relations/facts in a given knowledge base, thus often suffer from the problem of lack of supervision. In this paper, we propose a global distant supervision model for relation extraction, which can: 1) compensate the lack of supervision with a wide variety of indirect supervision knowledge; and 2) reduce the uncertainty in DS by performing joint inference across relation instances. Experimental results show that, by exploiting the consistency between relation labels, the consistency between relations and arguments, and the consistency between neighbor instances using Markov logic, our method significantly outperforms traditional DS approaches.
A Probabilistic Approach to Knowledge Translation
Jiang, Shangpu (University of Oregon) | Lowd, Daniel (University of Oregon) | Dou, Dejing (University of Oregon )
In this paper, we focus on a novel knowledge reuse scenario where the knowledge in the source schema needs to be translated to a semantically heterogeneous target schema. We refer to this task as “knowledge translation” (KT). Unlike data translation and transfer learning, KT does not require any data from the source or target schema. We adopt a probabilistic approach to KT by representing the knowledge in the source schema, the mapping between the source and target schemas, and the resulting knowledge in the target schema all as probability distributions, specially using Markov random fields and Markov logic networks. Given the source knowledge and mappings, we use standard learning and inference algorithms for probabilistic graphical models to find an explicit probability distribution in the target schema that minimizes the Kullback-Leibler divergence from the implicit distribution. This gives us a compact probabilistic model that represents knowledge from the source schema as well as possible, respecting the uncertainty in both the source knowledge and the mapping. In experiments on both propositional and relational domains, we find that the knowledge obtained by KT is comparable to other approaches that require data, demonstrating that knowledge can be reused without data.
Incremental Stochastic Factorization for Online Reinforcement Learning
Barreto, Andre M. S. (Laboratório Nacional de Computação Científica) | Beirigo, Rafael L. (Laboratório Nacional de Computação Científica) | Pineau, Joelle (McGill University) | Precup, Doina (McGill University)
A construct that has been receiving attention recently in reinforcement learning is stochastic factorization (SF), a particular case of non-negative factorization (NMF) in which the matrices involved are stochastic. The idea is to use SF to approximate the transition matrices of a Markov decision process (MDP). This is useful for two reasons. First, learning the factors of the SF instead of the transition matrices can reduce significantly the number of parameters to be estimated. Second, it has been shown that SF can be used to reduce the number of operations needed to compute an MDP's value function. Recently, an algorithm called expectation-maximization SF (EMSF) has been proposed to compute a SF directly from transitions sampled from an MDP. In this paper we take a closer look at EMSF. First, by exploiting the assumptions underlying the algorithm, we show that it is possible to reduce it to simple multiplicative update rules similar to the ones that helped popularize NMF. Second, we analyze the optimization process underlying EMSF and find that it minimizes a modified version of the Kullback-Leibler divergence that is particularly well-suited for learning a SF from data sampled from an arbitrary distribution. Third, we build on this improved understanding of EMSF to draw an interesting connection with NMF and probabilistic latent semantic analysis. We also exploit the simplified update rules to introduce a new version of EMSF that generalizes and significantly improves its precursor. This new algorithm provides a practical mechanism to control the trade-off between memory usage and computing time, essentially freeing the space complexity of EMSF from its dependency on the number of sample transitions. The algorithm can also compute its approximation incrementally, which makes it possible to use it concomitantly with the collection of data. This feature makes the new version of EMSF particularly suitable for online reinforcement learning. Empirical results support the utility of the proposed algorithm.
Learning a Hybrid Architecture for Sequence Regression and Annotation
Zhang, Yizhe (Duke University) | Henao, Ricardo (Duke University) | Carin, Lawrence (Duke University ) | Zhong, Jianling (Duke University) | Hartemink, Alexander (Duke University)
When learning a hidden Markov model (HMM), sequential observations can often be complemented by real-valued summary response variables generated from the path of hidden states. Such settings arise in numerous domains, including many applications in biology, like motif discovery and genome annotation. In this paper, we present a flexible framework for jointly modeling both latent sequence features and the functional mapping that relates the summary response variables to the hidden state sequence. The algorithm is compatible with a rich set of mapping functions. Results show that the availability of additional continuous response variables can simultaneously improve the annotation of the sequential observations and yield good prediction performance in both synthetic data and real-world datasets.
DECT: Distributed Evolving Context Tree for Understanding User Behavior Pattern Evolution
Shu, Xiaokui (Virginia Polytechnic Institute and State University) | Laptev, Nikolay (Yahoo Labs) | Yao, Danfeng (Daphne) (Virginia Polytechnic Institute and State University)
Internet user behavior models characterize user browsing dynamics or the transitions among web pages. The models help Internet companies improve their services by accurately targeting customers and providing them the information they want. For instance, specific web pages can be customized and prefetched for individuals based on sequences of web pages they have visited. Existing user behavior models abstracted as time-homogeneous Markov models cannot efficiently model user behavior variation through time. This demo presents DECT, a scalable time-variant variable-order Markov model. DECT digests terabytes of user session data and yields user behavior patterns through time. We realize DECT using Apache Spark and deploy it on top of Yahoo! infrastructure. We demonstrate the benefits of DECT with anomaly detection and ad click rate prediction applications. DECT enables the detection of higher-order path anomalies and provides deep insights into ad click rates with respect to user visiting paths.
Scaling-Up MAP and Marginal MAP Inference in Markov Logic
Sarkhel, Somdeb (The University of Texas at Dallas)
Markov Logic Networks (MLNs) use a few weighted first-order logic formulas to represent large probabilistic graphical models and are ideally suited for representing both relational and probabilistic knowledge in a wide variety of application domains such as, NLP, computer vision, and robotics. However, inference in them is hard because the graphical models can be extremely large, having millions of variables and features (potentials). Therefore, several lifted inference algorithms that exploit relational structure and operate at the compact first-order level, have been developed in recent years. However, the focus of much of existing research on lifted inference is on marginal inference while algorithms for MAP and marginal MAP inference are far less advanced. The aim of the proposed thesis is to fill this void, by developing next generation inference algorithms for MAP and marginal MAP inference.
Abstracting Complex Domains Using Modular Object-Oriented Markov Decision Processes
Squire, Shawn (University of Maryland, Baltimore County) | desJardins, Marie (University of Maryland, Baltimore County)
We present an initial proposal for modular object-oriented MDPs, an extension of OO-MDPs that abstracts complex domains that are partially observable and stochastic with multiple goals. Modes reduce the curse of dimensionality by reducing the number of attributes, objects, and actions into only the features relevant for each goal. These modes may also be used as an abstracted domain to be transferred to other modes or to another domain.
A Unifying Variational Inference Framework for Hierarchical Graph-Coupled HMM with an Application to Influenza Infection
Fan, Kai (Duke University) | Li, Chunyuan (Duke University) | Heller, Katherine (Duke University)
The Hierarchical Graph-Coupled Hidden Markov Model (hGCHMM) is a useful tool for tracking and predicting the spread of contagious diseases, such as influenza, by leveraging social contact data collected from individual wearable devices. However, the existing inference algorithms depend on the assumption that the infection rates are small in probability, typically close to 0. The purpose of this paper is to build a unified learning framework for latent infection state estimation for the hGCHMM, regardless of the infection rate and transition function. We derive our algorithm based on a dynamic auto-encoding variational inference scheme, thus potentially generalizing the hGCHMM to models other than those that work on highly contagious diseases. We experimentally compare our approach with previous Gibbs EM algorithms and standard variational method mean-field inference, on both semi-synthetic data and app collected epidemiological and social records.
Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models
Serban, Iulian V. (University of Montreal) | Sordoni, Alessandro (University of Montreal) | Bengio, Yoshua (University of Montreal) | Courville, Aaron (University of Montreal) | Pineau, Joelle ( McGill University )
We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models. Generative models produce system responses that are autonomously generated word-by-word, opening up the possibility for realistic, flexible interactions. In support of this goal, we extend the recently proposed hierarchical recurrent encoder-decoder neural network to the dialogue domain, and demonstrate that this model is competitive with state-of-the-art neural language models and back-off n-gram models. We investigate the limitations of this and similar approaches, and show how its performance can be improved by bootstrapping the learning from a larger question-answer pair corpus and from pretrained word embeddings.
Modeling Human Understanding of Complex Intentional Action with a Bayesian Nonparametric Subgoal Model
Nakahashi, Ryo (Sony Corporation) | Baker, Chris L. (Massachusetts Institute of Technology) | Tenenbaum, Joshua B. (Massachusetts Institute of Technology)
Most human behaviors consist of multiple parts, steps, or subtasks. These structures guide our ac- tion planning and execution, but when we observe others, the latent structure of their actions is typ- ically unobservable, and must be inferred in order to learn new skills by demonstration, or to as- sist others in completing their tasks. For example, an assistant who has learned the subgoal struc- ture of a colleague’s task can more rapidly rec- ognize and support their actions as they unfold. Here we model how humans infer subgoals from observations of complex action sequences using a nonparametric Bayesian model, which assumes that observed actions are generated by approxi- mately rational planning over unknown subgoal sequences. We test this model with a behavioral experiment in which humans observed different se- ries of goal-directed actions, and inferred both the number and composition of the subgoal sequences associated with each goal. The Bayesian model predicts human subgoal inferences with high ac- curacy, and significantly better than several al- ternative models and straightforward heuristics. Motivated by this result, we simulate how learn- ing and inference of subgoals can improve perfor- mance in an artificial user assistance task. The Bayesian model learns the correct subgoals from fewer observations, and better assists users by more rapidly and accurately inferring the goal of their actions than alternative approaches.