Learning Graphical Models
Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks
Hausser, Jean, Strimmer, Korbinian
We present a procedure for effective estimation of entropy and mutual information from small-sample data, and apply it to the problem of inferring high-dimensional gene association networks. Specifically, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropy-based gene-association network from gene expression data. A computer program is available that implements the proposed shrinkage estimator.
Efficient Markov Network Structure Discovery Using Independence Tests
Bromberg, F., Margaritis, D., Honavar, V.
We present two algorithms for learning the structure of a Markov network from data: GSMN* and GSIMN. Both algorithms use statistical independence tests to infer the structure by successively constraining the set of structures consistent with the results of these tests. Until very recently, algorithms for structure learning were based on maximum likelihood estimation, which has been proved to be NP-hard for Markov networks due to the difficulty of estimating the parameters of the network, needed for the computation of the data likelihood. The independence-based approach does not require the computation of the likelihood, and thus both GSMN* and GSIMN can compute the structure efficiently (as shown in our experiments). GSMN* is an adaptation of the Grow-Shrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN* by additionally exploiting Pearl's well-known properties of the conditional independence relation to infer novel independences from known ones, thus avoiding the performance of statistical tests to estimate them. To accomplish this efficiently GSIMN uses the Triangle theorem, also introduced in this work, which is a simplified version of the set of Markov axioms. Experimental comparisons on artificial and real-world data sets show GSIMN can yield significant savings with respect to GSMN*, while generating a Markov network with comparable or in some cases improved quality. We also compare GSIMN to a forward-chaining implementation, called GSIMN-FCH, that produces all possible conditional independences resulting from repeatedly applying Pearl's theorems on the known conditional independence tests. The results of this comparison show that GSIMN, by the sole use of the Triangle theorem, is nearly optimal in terms of the set of independences tests that it infers.
Leveraging Consensus and Divergence in Bayesian Belief Aggregation
Greene, Kshanti Auster (University of New Mexico)
Many fields have a need to build representative or predictive models from a number of unique individuals who each can contribute their experience and beliefs to the whole. For instance, intelligence agencies may wish to build a model from a number of experts to analyze potential terrorist attacks. In addition, a sociological survey may want a model representing the beliefs of cultural or political groups. However, challenges remain that have limited the success of merging opinions to form consensus models. Our research in progress presents a new approach to combine, or aggregate the beliefs of many individuals using graphical models. Existing Bayesian belief aggregation methods utilize an opinion pool function to find a single consensus on a given probability distribution. These opinion pool functions have many theoretical problems including breaking several assumptions for Bayesian reasoning. More practically, existing opinion pool functions do not represent reality well, especially in cases of diverse opinions.
Estimating the Impact of Public and Private Strategies for Controlling an Epidemic: A Multi-Agent Approach
Barrett, Christopher L. (Virginia Polytechnic Institute and State University) | Bisset, Keith (Virginia Polytechnic Institute and State University) | Leidig, Jonathan (Virginia Polytechnic Institute and State University) | Marathe, Achla (Virginia Polytechnic Institute and State University) | Marathe, Madhav (Virginia Polytechnic Institute and State University)
This paper describes a novel approach based on a combination of techniques in AI, parallel computing, and network science to address an important problem in social sciences and public health: planning and responding in the event of epidemics. Spread of infectious disease is an important societal problem -- human behavior, social networks, and the civil infrastructures all play a crucial role in initiating and controlling such epidemic processes. We specifically consider the economic and social effects of realistic interventions proposed and adopted by public health officials and behavioral changes of private citizens in the event of a ``flu-like'' epidemic. Our results provide new insights for developing robust public policies that can prove useful for epidemic planning.
Not So Naive Online Bayesian Spam Filter
Su, Baojun (Zhejiang University) | Xu, Congfu (Zhejiang University)
Spam filtering, as a key problem in electronic communication, has drawn significant attention due to increasingly huge amounts of junk email on the Internet. Content-based filtering is one reliable method in combating with spammers' changing tactics. Naive Bayes (NB) is one of the earliest content-based machine learning methods both in theory and practice in combating with spammers, which is easy to implement while can achieve considerable accuracy. In this paper, the traditional online Bayesian classifier are enhanced by two ways. First, from theory's point of view, we devise a self-adaptive mechanism to gradually weaken the assumption of independence required by original NB in the online training process, and as a result of that our NSNB is no longer ``naive''. Second, we propose other engineering ways to make the classifier more robust and accuracy. The experiment results show that our NSNB does give state-of-the-art classification performance on online spam filtering on large benchmark data sets while it is extremely fast and takes up little memory in comparison with other statistical methods.
Visualizing Topics with Multi-Word Expressions
Blei, David M., Lafferty, John D.
We describe a new method for visualizing topics, the distributions over terms that are automatically extracted from large text corpora using latent variable models. Our method finds significant $n$-grams related to a topic, which are then used to help understand and interpret the underlying distribution. Compared with the usual visualization, which simply lists the most probable topical terms, the multi-word expressions provide a better intuitive impression for what a topic is "about." Our approach is based on a language model of arbitrary length expressions, for which we develop a new methodology based on nested permutation tests to find significant phrases. We show that this method outperforms the more standard use of $\chi^2$ and likelihood ratio tests. We illustrate the topic presentations on corpora of scientific abstracts and news articles.
Bayesian Agglomerative Clustering with Coalescents
Teh, Yee Whye, Daumé, Hal III, Roy, Daniel
We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman's coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over others, and demonstrate our approach in document clustering and phylolinguistics.
Open Problems in Universal Induction & Intelligence
Specialized intelligent systems can be found everywhere: finger print, handwriting, speech, and face recognition, spam filtering, chess and other game programs, robots, et al. This decade the first presumably complete mathematical theory of artificial intelligence based on universal induction-prediction-decision-action has been proposed. This information-theoretic approach solidifies the foundations of inductive inference and artificial intelligence. Getting the foundations right usually marks a significant progress and maturing of a field. The theory provides a gold standard and guidance for researchers working on intelligent algorithms. The roots of universal induction have been laid exactly half-a-century ago and the roots of universal intelligence exactly one decade ago. So it is timely to take stock of what has been achieved and what remains to be done. Since there are already good recent surveys, I describe the state-of-the-art only in passing and refer the reader to the literature. This article concentrates on the open problems in universal induction and its extension to universal intelligence.
Characterization of the convergence of stationary Fokker-Planck learning
The convergence properties of the stationary Fokker-Planck algorithm for the estimation of the asymptotic density of stochastic search processes is studied. Theoretical and empirical arguments for the characterization of convergence of the estimation in the case of separable and nonseparable nonlinear optimization problems are given. Some implications of the convergence of stationary Fokker-Planck learning for the inference of parameters in artificial neural network models are outlined.
Learning Bayesian Network Equivalence Classes with Ant Colony Optimization
Bayesian networks are a useful tool in the representation of uncertain knowledge. This paper proposes a new algorithm called ACO-E, to learn the structure of a Bayesian network. It does this by conducting a search through the space of equivalence classes of Bayesian networks using Ant Colony Optimization (ACO). To this end, two novel extensions of traditional ACO techniques are proposed and implemented. Firstly, multiple types of moves are allowed. Secondly, moves can be given in terms of indices that are not based on construction graph nodes. The results of testing show that ACO-E performs better than a greedy search and other state-of-the-art and metaheuristic algorithms whilst searching in the space of equivalence classes.