Goto

Collaborating Authors

 Learning Graphical Models


Structured Generative Models of Natural Source Code

arXiv.org Machine Learning

We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have three key properties: First, they incorporate both sequential and hierarchical structure. Second, we learn a distributed representation of source code elements. Finally, they integrate closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope. Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the models, measured by the probability of generating test programs.


HC-Search: A Learning Framework for Search-based Structured Prediction

Journal of Artificial Intelligence Research

Structured prediction is the problem of learning a function that maps structured inputs to structured outputs. Prototypical examples of structured prediction include part-of-speech tagging and semantic segmentation of images. Inspired by the recent successes of search-based structured prediction, we introduce a new framework for structured prediction called HC-Search. Given a structured input, the framework uses a search procedure guided by a learned heuristic H to uncover high quality candidate outputs and then employs a separate learned cost function C to select a final prediction among those outputs. The overall loss of this prediction architecture decomposes into the loss due to H not leading to high quality outputs, and the loss due to C not selecting the best among the generated outputs. Guided by this decomposition, we minimize the overall loss in a greedy stage-wise manner by first training H to quickly uncover high quality outputs via imitation learning, and then training C to correctly rank the outputs generated via H according to their true losses. Importantly, this training procedure is sensitive to the particular loss function of interest and the time-bound allowed for predictions. Experiments on several benchmark domains show that our approach significantly outperforms several state-of-the-art methods.


Lifted Tree-Reweighted Variational Inference

arXiv.org Artificial Intelligence

We analyze variational inference for highly symmetric graphical models such as those arising from first-order probabilistic models. We first show that for these graphical models, the tree-reweighted variational objective lends itself to a compact lifted formulation which can be solved much more efficiently than the standard TRW formulation for the ground graphical model. Compared to earlier work on lifted belief propagation, our formulation leads to a convex optimization problem for lifted marginal inference and provides an upper bound on the partition function. We provide two approaches for improving the lifted TRW upper bound. The first is a method for efficiently computing maximum spanning trees in highly symmetric graphs, which can be used to optimize the TRW edge appearance probabilities. The second is a method for tightening the relaxation of the marginal polytope using lifted cycle inequalities and novel exchangeable cluster consistency constraints.


Game-Theoretic Patrolling with Dynamic Execution Uncertainty and a Case Study on a Real Transit System

Journal of Artificial Intelligence Research

Attacker-Defender Stackelberg security games (SSGs) have emerged as an important research area in multi-agent systems. However, existing SSGs models yield fixed, static, schedules which fail in dynamic domains where defenders face execution uncertainty, i.e., in domains where defenders may face unanticipated disruptions of their schedules. A concrete example is an application involving checking fares on trains, where a defender's schedule is frequently interrupted by fare evaders, making static schedules useless. To address this shortcoming, this paper provides four main contributions. First, we present a novel general Bayesian Stackelberg game model for security resource allocation in dynamic uncertain domains. In this new model, execution uncertainty is handled by using a Markov decision process (MDP) for generating defender policies. Second, we study the problem of computing a Stackelberg equilibrium for this game and exploit problem structure to reduce it to a polynomial-sized optimization problem. Shifting to evaluation, our third contribution shows, in simulation, that our MDP-based policies overcome the failures of previous SSG algorithms. In so doing, we can now build a complete system, that enables handling of schedule interruptions and, consequently, to conduct some of the first controlled experiments on SSGs in the field.


Authorship Attribution through Function Word Adjacency Networks

arXiv.org Machine Learning

A method for authorship attribution based on function word adjacency networks (WANs) is introduced. Function words are parts of speech that express grammatical relationships between other words but do not carry lexical meaning on their own. In the WANs in this paper, nodes are function words and directed edges stand in for the likelihood of finding the sink word in the ordered vicinity of the source word. WANs of different authors can be interpreted as transition probabilities of a Markov chain and are therefore compared in terms of their relative entropies. Optimal selection of WAN parameters is studied and attribution accuracy is benchmarked across a diverse pool of authors and varying text lengths. This analysis shows that, since function words are independent of content, their use tends to be specific to an author and that the relational data captured by function WANs is a good summary of stylometric fingerprints. Attribution accuracy is observed to exceed the one achieved by methods that rely on word frequencies alone. Further combining WANs with methods that rely on word frequencies alone, results in larger attribution accuracy, indicating that both sources of information encode different aspects of authorial styles.


Bayesian Optimal Control of Smoothly Parameterized Systems: The Lazy Posterior Sampling Algorithm

arXiv.org Machine Learning

We study Bayesian optimal control of a general class of smoothly parameterized Markov decision problems. Since computing the optimal control is computationally expensive, we design an algorithm that trades off performance for computational efficiency. The algorithm is a lazy posterior sampling method that maintains a distribution over the unknown parameter. The algorithm changes its policy only when the variance of the distribution is reduced sufficiently. Importantly, we analyze the algorithm and show the precise nature of the performance vs. computation tradeoff. Finally, we show the effectiveness of the method on a web server control application.


Investigation of commuting Hamiltonian in quantum Markov network

arXiv.org Artificial Intelligence

Noname manuscript No. (will be inserted by the editor) Abstract Graphical Models have various applications in science and engineering which include physics, bioinformatics, telecommunication and etc. Usage of graphical models needs complex computations in order to evaluation of marginal functions, so there are some powerful methods including mean field approximation, belief propagation algorithm and etc. Quantum graphical models have been recently developed in context of quantum information and computation, and quantum statistical physics, which is possible by generalization of classical probability theory to quantum theory. The main goal of this paper is preparing a primary generalization of Markov network, as a type of graphical models, to quantum case and applying in quantum statistical physics. We have investigated the Markov network and the role of commuting Hamiltonian terms in conditional independence with simple examples of quantum statistical physics. Keywords Graphical models · Quantum graphical models · Conditional independence · Quantum conditional independence · Commuting Hamiltonian · Quantum Markov network 1 Introduction In 1988, Pearl devised belief propagation algorithm to solve marginalization and other inference problems.


Expressive Power and Approximation Errors of Restricted Boltzmann Machines

arXiv.org Machine Learning

We present explicit classes of probability distributions that can be learned by Restricted Boltzmann Machines (RBMs) depending on the number of units that they contain, and which are representative for the expressive power of the model. We use this to show that the maximal Kullback-Leibler divergence to the RBM model with $n$ visible and $m$ hidden units is bounded from above by $n - \left\lfloor \log(m+1) \right\rfloor - \frac{m+1}{2^{\left\lfloor\log(m+1)\right\rfloor}} \approx (n -1) - \log(m+1)$. In this way we can specify the number of hidden units that guarantees a sufficiently rich model containing different classes of distributions and respecting a given error tolerance.


Event and Anomaly Detection Using Tucker3 Decomposition

arXiv.org Artificial Intelligence

Failure detection in telecommunication networks is a vital task. So far, several supervised and unsupervised solutions have been provided for discovering failures in such networks. Among them unsupervised approaches has attracted more attention since no label data is required. Often, network devices are not able to provide information about the type of failure. In such cases the type of failure is not known in advance and the unsupervised setting is more appropriate for diagnosis. Among unsupervised approaches, Principal Component Analysis (PCA) is a well-known solution which has been widely used in the anomaly detection literature and can be applied to matrix data (e.g. Users-Features). However, one of the important properties of network data is their temporal sequential nature. So considering the interaction of dimensions over a third dimension, such as time, may provide us better insights into the nature of network failures. In this paper we demonstrate the power of three-way analysis to detect events and anomalies in time-evolving network data.


Guarantees and Limits of Preprocessing in Constraint Satisfaction and Reasoning

arXiv.org Artificial Intelligence

We present a first theoretical analysis of the power of polynomial-time preprocessing for important combinatorial problems from various areas in AI. We consider problems from Constraint Satisfaction, Global Constraints, Satisfiability, Nonmonotonic and Bayesian Reasoning under structural restrictions. All these problems involve two tasks: (i) identifying the structure in the input as required by the restriction, and (ii) using the identified structure to solve the reasoning task efficiently. We show that for most of the considered problems, task (i) admits a polynomial-time preprocessing to a problem kernel whose size is polynomial in a structural problem parameter of the input, in contrast to task (ii) which does not admit such a reduction to a problem kernel of polynomial size, subject to a complexity theoretic assumption. As a notable exception we show that the consistency problem for the AtMost-NValue constraint admits a polynomial kernel consisting of a quadratic number of variables and domain values. Our results provide a firm worst-case guarantees and theoretical boundaries for the performance of polynomial-time preprocessing algorithms for the considered problems.