Learning Graphical Models
Global Distant Supervision for Relation Extraction
Han, Xianpei (Institute of Software, Chinese Academy of Sciences) | Sun, Le (Institute of Software, Chinese Academy of Sciences)
Machine learning approaches to relation extraction are typically supervised and require expensive labeled data. To break the bottleneck of labeled data, a promising approach is to exploit easily obtained indirect supervision knowledge – which we usually refer to as distant supervision (DS). However, traditional DS methods mostly only exploit one specific kind of indirect supervision knowledge – the relations/facts in a given knowledge base, thus often suffer from the problem of lack of supervision. In this paper, we propose a global distant supervision model for relation extraction, which can: 1) compensate the lack of supervision with a wide variety of indirect supervision knowledge; and 2) reduce the uncertainty in DS by performing joint inference across relation instances. Experimental results show that, by exploiting the consistency between relation labels, the consistency between relations and arguments, and the consistency between neighbor instances using Markov logic, our method significantly outperforms traditional DS approaches.
A Probabilistic Approach to Knowledge Translation
Jiang, Shangpu (University of Oregon) | Lowd, Daniel (University of Oregon) | Dou, Dejing (University of Oregon )
In this paper, we focus on a novel knowledge reuse scenario where the knowledge in the source schema needs to be translated to a semantically heterogeneous target schema. We refer to this task as “knowledge translation” (KT). Unlike data translation and transfer learning, KT does not require any data from the source or target schema. We adopt a probabilistic approach to KT by representing the knowledge in the source schema, the mapping between the source and target schemas, and the resulting knowledge in the target schema all as probability distributions, specially using Markov random fields and Markov logic networks. Given the source knowledge and mappings, we use standard learning and inference algorithms for probabilistic graphical models to find an explicit probability distribution in the target schema that minimizes the Kullback-Leibler divergence from the implicit distribution. This gives us a compact probabilistic model that represents knowledge from the source schema as well as possible, respecting the uncertainty in both the source knowledge and the mapping. In experiments on both propositional and relational domains, we find that the knowledge obtained by KT is comparable to other approaches that require data, demonstrating that knowledge can be reused without data.
Incremental Stochastic Factorization for Online Reinforcement Learning
Barreto, Andre M. S. (Laboratório Nacional de Computação Científica) | Beirigo, Rafael L. (Laboratório Nacional de Computação Científica) | Pineau, Joelle (McGill University) | Precup, Doina (McGill University)
A construct that has been receiving attention recently in reinforcement learning is stochastic factorization (SF), a particular case of non-negative factorization (NMF) in which the matrices involved are stochastic. The idea is to use SF to approximate the transition matrices of a Markov decision process (MDP). This is useful for two reasons. First, learning the factors of the SF instead of the transition matrices can reduce significantly the number of parameters to be estimated. Second, it has been shown that SF can be used to reduce the number of operations needed to compute an MDP's value function. Recently, an algorithm called expectation-maximization SF (EMSF) has been proposed to compute a SF directly from transitions sampled from an MDP. In this paper we take a closer look at EMSF. First, by exploiting the assumptions underlying the algorithm, we show that it is possible to reduce it to simple multiplicative update rules similar to the ones that helped popularize NMF. Second, we analyze the optimization process underlying EMSF and find that it minimizes a modified version of the Kullback-Leibler divergence that is particularly well-suited for learning a SF from data sampled from an arbitrary distribution. Third, we build on this improved understanding of EMSF to draw an interesting connection with NMF and probabilistic latent semantic analysis. We also exploit the simplified update rules to introduce a new version of EMSF that generalizes and significantly improves its precursor. This new algorithm provides a practical mechanism to control the trade-off between memory usage and computing time, essentially freeing the space complexity of EMSF from its dependency on the number of sample transitions. The algorithm can also compute its approximation incrementally, which makes it possible to use it concomitantly with the collection of data. This feature makes the new version of EMSF particularly suitable for online reinforcement learning. Empirical results support the utility of the proposed algorithm.
Learning a Hybrid Architecture for Sequence Regression and Annotation
Zhang, Yizhe (Duke University) | Henao, Ricardo (Duke University) | Carin, Lawrence (Duke University ) | Zhong, Jianling (Duke University) | Hartemink, Alexander (Duke University)
When learning a hidden Markov model (HMM), sequential observations can often be complemented by real-valued summary response variables generated from the path of hidden states. Such settings arise in numerous domains, including many applications in biology, like motif discovery and genome annotation. In this paper, we present a flexible framework for jointly modeling both latent sequence features and the functional mapping that relates the summary response variables to the hidden state sequence. The algorithm is compatible with a rich set of mapping functions. Results show that the availability of additional continuous response variables can simultaneously improve the annotation of the sequential observations and yield good prediction performance in both synthetic data and real-world datasets.
Inferring Multi-Dimensional Ideal Points for US Supreme Court Justices
Islam, Mohammad Raihanul (Virginia Polytechnic Institute and State University) | Hossain, K. S. M. Tozammel (Virginia Polytechnic Institute and State University) | Krishnan, Siddharth (Virginia Polytechnic Institute and State University) | Ramakrishnan, Naren (Virginia Polytechnic Institute and State University)
In Supreme Court parlance and the political science literature, an ideal point positions a justice in a continuous space and can be interpreted as a quantification of the justice's policy preferences. We present an automated approach to infer such ideal points for justices of the US Supreme Court. This approach combines topic modeling over case opinions with the voting (and endorsing) behavior of justices. Furthermore, given a topic of interest, say the Fourth Amendment, the topic model can be optionally seeded with supervised information to steer the inference of ideal points. Application of this methodology over five years of cases provides interesting perspectives into the leaning of justices on crucial issues, coalitions underlying specific topics, and the role of swing justices in deciding the outcomes of cases.
Instance Specific Metric Subspace Learning: A Bayesian Approach
Ye, Han-Jia (Nanjing University) | Zhan, De-Chuan (Nanjing University) | Jiang, Yuan (Nanjing University)
Instead of using a uniform metric, instance specific distance learning methods assign multiple metrics for different localities, which take data heterogeneity into consideration. Therefore, they may improve the performance of distance based classifiers, e.g., kNN. Existing methods obtain multiple metrics of test data by either transductively assigning metrics for unlabeled instances or designing distance functions manually, which are with limited generalization ability. In this paper, we propose isMets (Instance Specific METric Subspace) framework which can automatically span the whole metric space in a generative manner and is able to inductively learn a specific metric subspace for each instance via inferring the expectation over the metric bases in a Bayesian manner. The whole framework can be solved with Variational Bayes (VB). Experiment on synthetic data shows that the learned results are with good interpretability. Moreover, comprehensive results on real world datasets validate the effectiveness and robustness of isMets.
DECT: Distributed Evolving Context Tree for Understanding User Behavior Pattern Evolution
Shu, Xiaokui (Virginia Polytechnic Institute and State University) | Laptev, Nikolay (Yahoo Labs) | Yao, Danfeng (Daphne) (Virginia Polytechnic Institute and State University)
Internet user behavior models characterize user browsing dynamics or the transitions among web pages. The models help Internet companies improve their services by accurately targeting customers and providing them the information they want. For instance, specific web pages can be customized and prefetched for individuals based on sequences of web pages they have visited. Existing user behavior models abstracted as time-homogeneous Markov models cannot efficiently model user behavior variation through time. This demo presents DECT, a scalable time-variant variable-order Markov model. DECT digests terabytes of user session data and yields user behavior patterns through time. We realize DECT using Apache Spark and deploy it on top of Yahoo! infrastructure. We demonstrate the benefits of DECT with anomaly detection and ad click rate prediction applications. DECT enables the detection of higher-order path anomalies and provides deep insights into ad click rates with respect to user visiting paths.
Scaling-Up MAP and Marginal MAP Inference in Markov Logic
Sarkhel, Somdeb (The University of Texas at Dallas)
Markov Logic Networks (MLNs) use a few weighted first-order logic formulas to represent large probabilistic graphical models and are ideally suited for representing both relational and probabilistic knowledge in a wide variety of application domains such as, NLP, computer vision, and robotics. However, inference in them is hard because the graphical models can be extremely large, having millions of variables and features (potentials). Therefore, several lifted inference algorithms that exploit relational structure and operate at the compact first-order level, have been developed in recent years. However, the focus of much of existing research on lifted inference is on marginal inference while algorithms for MAP and marginal MAP inference are far less advanced. The aim of the proposed thesis is to fill this void, by developing next generation inference algorithms for MAP and marginal MAP inference.
Abstracting Complex Domains Using Modular Object-Oriented Markov Decision Processes
Squire, Shawn (University of Maryland, Baltimore County) | desJardins, Marie (University of Maryland, Baltimore County)
We present an initial proposal for modular object-oriented MDPs, an extension of OO-MDPs that abstracts complex domains that are partially observable and stochastic with multiple goals. Modes reduce the curse of dimensionality by reducing the number of attributes, objects, and actions into only the features relevant for each goal. These modes may also be used as an abstracted domain to be transferred to other modes or to another domain.
Bayesian AutoEncoder: Generation of Bayesian Networks with Hidden Nodes for Features
Nishino, Kaneharu (The University of Tokyo) | Inaba, Mary (The University of Tokyo)
We propose Bayesian AutoEncoder (BAE) in order to construct a recognition system which uses feedback information. BAE constructs a generative model of input data as a Bayes Net. The network trained by BAE obtains its hidden variables as the features of given data. It can execute inference for each variable through belief propagation, using both feedforward and feedback information. We confirmed that BAE can construct small networks with one hidden layer and extract features as hidden variables from 3x3 and 5x5 pixel input data.