Goto

Collaborating Authors

 Genre


Bio-inspired data mining: Treating malware signatures as biosequences

arXiv.org Machine Learning

The application of machine learning to bioinformatics problems is well established. Less well understood is the application of bioinformatics techniques to machine learning and, in particular, the representation of non-biological data as biosequences. The aim of this paper is to explore the effects of giving amino acid representation to problematic machine learning data and to evaluate the benefits of supplementing traditional machine learning with bioinformatics tools and techniques. The signatures of 60 computer viruses and 60 computer worms were converted into amino acid representations and first multiply aligned separately to identify conserved regions across different families within each class (virus and worm). This was followed by a second alignment of all 120 aligned signatures together so that non-conserved regions were identified prior to input to a number of machine learning techniques. Differences in length between virus and worm signatures after the first alignment were resolved by the second alignment. Our first set of experiments indicates that representing computer malware signatures as amino acid sequences followed by alignment leads to greater classification and prediction accuracy. Our second set of experiments indicates that checking the results of data mining from artificial virus and worm data against known proteins can lead to generalizations being made from the domain of naturally occurring proteins to malware signatures. However, further work is needed to determine the advantages and disadvantages of different representations and sequence alignment methods for handling problematic machine learning data.


A consistent clustering-based approach to estimating the number of change-points in highly dependent time-series

arXiv.org Machine Learning

The problem of change-point estimation is considered under a general framework where the data are generated by unknown stationary ergodic process distributions. In this context, the consistent estimation of the number of change-points is provably impossible. However, it is shown that a consistent clustering method may be used to estimate the number of change points, under the additional constraint that the correct number of process distributions that generate the data is provided. This additional parameter has a natural interpretation in many real-world applications. An algorithm is proposed that estimates the number of change-points and locates the changes. The proposed algorithm is shown to be asymptotically consistent; its empirical evaluations are provided.


Constraining Influence Diagram Structure by Generative Planning: An Application to the Optimization of Oil Spill Response

arXiv.org Artificial Intelligence

This paper works through the optimization of a real world planning problem, with a combination of a generative planning tool and an influence diagram solver. The problem is taken from an existing application in the domain of oil spill emergency response. The planning agent manages constraints that order sets of feasible equipment employment actions. This is mapped at an intermediate level of abstraction onto an influence diagram. In addition, the planner can apply a surveillance operator that determines observability of the state---the unknown trajectory of the oil. The uncertain world state and the objective function properties are part of the influence diagram structure, but not represented in the planning agent domain. By exploiting this structure under the constraints generated by the planning agent, the influence diagram solution complexity simplifies considerably, and an optimum solution to the employment problem based on the objective function is found. Finding this optimum is equivalent to the simultaneous evaluation of a range of plans. This result is an example of bounded optimality, within the limitations of this hybrid generative planner and influence diagram architecture.


A Discovery Algorithm for Directed Cyclic Graphs

arXiv.org Artificial Intelligence

Directed acyclic graphs have been used fruitfully to represent causal strucures (Pearl 1988). However, in the social sciences and elsewhere models are often used which correspond both causally and statistically to directed graphs with directed cycles (Spirtes 1995). Pearl (1993) discussed predicting the effects of intervention in models of this kind, so-called linear non-recursive structural equation models. This raises the question of whether it is possible to make inferences about causal structure with cycles, form sample data. In particular do there exist general, informative, feasible and reliable precedures for inferring causal structure from conditional independence relations among variables in a sample generated by an unknown causal structure? In this paper I present a discovery algorithm that is correct in the large sample limit, given commonly (but often implicitly) made plausible assumptions, and which provides information about the existence or non-existence of causal pathways from one variable to another. The algorithm is polynomial on sparse graphs.


Propagation of 2-Monotone Lower Probabilities on an Undirected Graph

arXiv.org Artificial Intelligence

Lower and upper probabilities, also known as Choquet capacities, are widely used as a convenient representation for sets of probability distributions. This paper presents a graphical decomposition and exact propagation algorithm for computing marginal posteriors of 2-monotone lower probabilities (equivalently, 2-alternating upper probabilities).


On the Sample Complexity of Learning Bayesian Networks

arXiv.org Machine Learning

In recent years there has been an increasing interest in learning Bayesian networks from data. One of the most effective methods for learning such networks is based on the minimum description length (MDL) principle. Previous work has shown that this learning procedure is asymptotically successful: with probability one, it will converge to the target distribution, given a sufficient number of samples. However, the rate of this convergence has been hitherto unknown. In this work we examine the sample complexity of MDL based learning procedures for Bayesian networks. We show that the number of samples needed to learn an epsilon-close approximation (in terms of entropy distance) with confidence delta is O((1/epsilon)^(4/3)log(1/epsilon)log(1/delta)loglog (1/delta)). This means that the sample complexity is a low-order polynomial in the error threshold and sub-linear in the confidence bound. We also discuss how the constants in this term depend on the complexity of the target distribution. Finally, we address questions of asymptotic minimality and propose a method for using the sample complexity results to speed up the learning process.


Learning Bayesian Networks with Local Structure

arXiv.org Artificial Intelligence

In this paper we examine a novel addition to the known methods for learning Bayesian networks from data that improves the quality of the learned networks. Our approach explicitly represents and learns the local structure in the conditional probability tables (CPTs), that quantify these networks. This increases the space of possible models, enabling the representation of CPTs with a variable number of parameters that depends on the learned local structures. The resulting learning procedure is capable of inducing models that better emulate the real complexity of the interactions present in the data. We describe the theoretical foundations and practical aspects of learning local structures, as well as an empirical evaluation of the proposed method. This evaluation indicates that learning curves characterizing the procedure that exploits the local structure converge faster than these of the standard procedure. Our results also show that networks learned with local structure tend to be more complex (in terms of arcs), yet require less parameters.


Quasi-Bayesian Strategies for Efficient Plan Generation: Application to the Planning to Observe Problem

arXiv.org Artificial Intelligence

Quasi-Bayesian theory uses convex sets of probability distributions and expected loss to represent preferences about plans. The theory focuses on decision robustness, i.e., the extent to which plans are affected by deviations in subjective assessments of probability. The present work presents solutions for plan generation when robustness of probability assessments must be included: plans contain information about the robustness of certain actions. The surprising result is that some problems can be solved faster in the Quasi-Bayesian framework than within usual Bayesian theory. We investigate this on the planning to observe problem, i.e., an agent must decide whether to take new observations or not. The fundamental question is: How, and how much, to search for a "best" plan, based on the robustness of probability assessments? Plan generation algorithms are derived in the context of material classification with an acoustic robotic probe. A package that constructs Quasi-Bayesian plans is available through anonymous ftp.


Bayesian Learning of Loglinear Models for Neural Connectivity

arXiv.org Machine Learning

This paper presents a Bayesian approach to learning the connectivity structure of a group of neurons from data on configuration frequencies. A major objective of the research is to provide statistical tools for detecting changes in firing patterns with changing stimuli. Our framework is not restricted to the well-understood case of pair interactions, but generalizes the Boltzmann machine model to allow for higher order interactions. The paper applies a Markov Chain Monte Carlo Model Composition (MC3) algorithm to search over connectivity structures and uses Laplace's method to approximate posterior probabilities of structures. Performance of the methods was tested on synthetic data. The models were also applied to data obtained by Vaadia on multi-unit recordings of several neurons in the visual cortex of a rhesus monkey in two different attentional states. Results confirmed the experimenters' conjecture that different attentional states were associated with different interaction structures.


Joint variable and rank selection for parsimonious estimation of high-dimensional matrices

arXiv.org Machine Learning

We propose dimension reduction methods for sparse, high-dimensional multivariate response regression models. Both the number of responses and that of the predictors may exceed the sample size. Sometimes viewed as complementary, predictor selection and rank reduction are the most popular strategies for obtaining lower-dimensional approximations of the parameter matrix in such models. We show in this article that important gains in prediction accuracy can be obtained by considering them jointly. We motivate a new class of sparse multivariate regression models, in which the coefficient matrix has low rank and zero rows or can be well approximated by such a matrix. Next, we introduce estimators that are based on penalized least squares, with novel penalties that impose simultaneous row and rank restrictions on the coefficient matrix. We prove that these estimators indeed adapt to the unknown matrix sparsity and have fast rates of convergence. We support our theoretical results with an extensive simulation study and two data analyses.