Goto

Collaborating Authors

 Learning Graphical Models


Distinguishing Cause and Effect via Second Order Exponential Models

arXiv.org Machine Learning

We propose a method to infer causal structures containing both discrete and continuous variables. The idea is to select causal hypotheses for which the conditional density of every variable, given its causes, becomes smooth. We define a family of smooth densities and conditional densities by second order exponential models, i.e., by maximizing conditional entropy subject to first and second statistical moments. If some of the variables take only values in proper subsets of R^n, these conditionals can induce different families of joint distributions even for Markov-equivalent graphs. We consider the case of one binary and one real-valued variable where the method can distinguish between cause and effect. Using this example, we describe that sometimes a causal hypothesis must be rejected because P(effect|cause) and P(cause) share algorithmic information (which is untypical if they are chosen independently). This way, our method is in the same spirit as faithfulness-based causal inference because it also rejects non-generic mutual adjustments among DAG-parameters.


Which graphical models are difficult to learn?

arXiv.org Machine Learning

We consider the problem of learning the structure of Ising models (pairwise binary Markov random fields) from i.i.d. samples. While several methods have been proposed to accomplish this task, their relative merits and limitations remain somewhat obscure. By analyzing a number of concrete examples, we show that low-complexity algorithms systematically fail when the Markov random field develops long-range correlations. More precisely, this phenomenon appears to be related to the Ising model phase transition (although it does not coincide with it).


Content Modeling Using Latent Permutations

Journal of Artificial Intelligence Research

We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods.


Sparsification and feature selection by compressive linear regression

arXiv.org Machine Learning

The Minimum Description Length (MDL) principle states that the optimal model for a given data set is that which compresses it best. Due to practial limitations the model can be restricted to a class such as linear regression models, which we address in this study. As in other formulations such as the LASSO and forward step-wise regression we are interested in sparsifying the feature set while preserving generalization ability. We derive a well-principled set of codes for both parameters and error residuals along with smooth approximations to lengths of these codes as to allow gradient descent optimization of description length, and go on to show that sparsification and feature selection using our approach is faster than the LASSO on several datasets from the UCI and StatLib repositories, with favorable generalization accuracy, while being fully automatic, requiring neither cross-validation nor tuning of regularization hyper-parameters, allowing even for a nonlinear expansion of the feature set followed by sparsification.


YQX Plays Chopin

AI Magazine

The article is about AI research in the context of a complex artistic behavior: expressive music performance. A computer program is presented that learns to play piano with 'expression' and that even won an international computer piano performance contest. A superficial analysis of an expressive performance generated by the system seems to suggest creative musical abilities. After a critical discussion of the processes underlying this behavior, we abandon the question of whether the system is really creative, and turn to the true motivation that drives this research: to use AI methods to investigate and better understand music performance as a human creative behavior. A number of recent and current results from our research are briefly presented that indicate that machines can give us interesting insights into such a complex creative behavior, even if they may not be creative themselves.


Learning Class-Level Bayes Nets for Relational Data

arXiv.org Artificial Intelligence

Many databases store data in relational format, with different types of entities and information about links between the entities. The field of statistical-relational learning (SRL) has developed a number of new statistical models for such data. In this paper we focus on learning class-level or first-order dependencies, which model the general database statistics over attributes of linked objects and links (e.g., the percentage of A grades given in computer science classes). Class-level statistical relationships are important in themselves, and they support applications like policy making, strategic planning, and query optimization. Most current SRL methods find class-level dependencies, but their main task is to support instance-level predictions about the attributes or links of specific entities. We focus only on class-level prediction, and describe algorithms for learning class-level models that are orders of magnitude faster for this task. Our algorithms learn Bayes nets with relational structure, leveraging the efficiency of single-table nonrelational Bayes net learners. An evaluation of our methods on three data sets shows that they are computationally feasible for realistic table sizes, and that the learned structures represent the statistical information in the databases well. After learning compiles the database statistics into a Bayes net, querying these statistics via Bayes net inference is faster than with SQL queries, and does not depend on the size of the database.


Efficient Bayesian analysis of multiple changepoint models with dependence across segments

arXiv.org Machine Learning

We consider Bayesian analysis of a class of multiple changepoint models. While there are a variety of efficient ways to analyse these models if the parameters associated with each segment are independent, there are few general approaches for models where the parameters are dependent. Under the assumption that the dependence is Markov, we propose an efficient online algorithm for sampling from an approximation to the posterior distribution of the number and position of the changepoints. In a simulation study, we show that the approximation introduced is negligible. We illustrate the power of our approach through fitting piecewise polynomial models to data, under a model which allows for either continuity or discontinuity of the underlying curve at each changepoint. This method is competitive with, or out-performs, other methods for inferring curves from noisy data; and uniquely it allows for inference of the locations of discontinuities in the underlying curve.


An Immune Inspired Approach to Anomaly Detection

arXiv.org Artificial Intelligence

The immune system provides a rich metaphor for computer security: anomaly detection that works in nature should work for machines. However, early artificial immune system approaches for computer security had only limited success. Arguably, this was due to these artificial systems being based on too simplistic a view of the immune system. We present here a second generation artificial immune system for process anomaly detection. It improves on earlier systems by having different artificial cell types that process information. Following detailed information about how to build such second generation systems, we find that communication between cells types is key to performance. Through realistic testing and validation we show that second generation artificial immune systems are capable of anomaly detection beyond generic system policies. The paper concludes with a discussion and outline of the next steps in this exciting area of computer security.


$L_0$ regularized estimation for nonlinear models that have sparse underlying linear structures

arXiv.org Machine Learning

We study the estimation of $\beta$ for the nonlinear model $y = f(X\sp{\top}\beta) + \epsilon$ when $f$ is a nonlinear transformation that is known, $\beta$ has sparse nonzero coordinates, and the number of observations can be much smaller than that of parameters ($n\ll p$). We show that in order to bound the $L_2$ error of the $L_0$ regularized estimator $\hat\beta$, i.e., $\|\hat\beta - \beta\|_2$, it is sufficient to establish two conditions. Based on this, we obtain bounds of the $L_2$ error for (1) $L_0$ regularized maximum likelihood estimation (MLE) for exponential linear models and (2) $L_0$ regularized least square (LS) regression for the more general case where $f$ is analytic. For the analytic case, we rely on power series expansion of $f$, which requires taking into account the singularities of $f$.


Finite element model selection using Particle Swarm Optimization

arXiv.org Artificial Intelligence

This paper proposes the application of particle swarm optimization (PSO) to the problem of finite element model (FEM) selection. This problem arises when a choice of the best model for a system has to be made from set of competing models, each developed a priori from engineering judgment. PSO is a population-based stochastic search algorithm inspired by the behaviour of biological entities in nature when they are foraging for resources. Each potentially correct model is represented as a particle that exhibits both individualistic and group behaviour. Each particle moves within the model search space looking for the best solution by updating the parameters values that define it. The most important step in the particle swarm algorithm is the method of representing models which should take into account the number, location and variables of parameters to be updated. One example structural system is used to show the applicability of PSO in finding an optimal FEM. An optimal model is defined as the model that has the least number of updated parameters and has the smallest parameter variable variation from the mean material properties. Two different objective functions are used to compare performance of the PSO algorithm.