Goto

Collaborating Authors

 Bayesian Learning


Discrete Temporal Models of Social Networks

arXiv.org Machine Learning

The field of social network analysis is concerned with populations of actors, interconnected by a set of relations (e.g., friendship, communication, etc.). These relationships can be concisely described by directed graphs, with one vertex for each actor and an edge for each relation between a pair of actors. This network representation of a population can provide insight into organizational structures, social behavior patterns, emergence of global structure from local dynamics, and a variety of other social phenomena. There has been increasing demand for flexible statistical models of social networks, for the purposes of scientific exploration and as a basis for practical analysis and data mining tools. The subject of modeling a static social network has been investigated in some depth. For time-invariant networks, represented as a single directed or undirected graph, a number of flexible statistical models have been proposed, including the classic Exponential Random Graph Models (ERGM) and extensions (Frank and Strauss, 1986; Wasserman and Robins, 2005; Snijders, 2002; Robins and Pattison, 2005), which are descriptive in nature, latent space models that aim towards clustering and community discovery (Handcock and Raftery, 2007), and mixed-membership block models for role discovery (Airoldi et al., 2008). Of particular relevance to this paper is the ERGM, which is particularly flexible in that it can be customized to capture a wide range of signature connectivity patterns in the network via user-specified functions representing their sufficient statistics. Specifically, if N is some representation of a social network, and N is the set of all possible networks in this representation, then the probability distribution function for any ERGM can be written in the following general 2 form.


Optimal Value of Information in Graphical Models

Journal of Artificial Intelligence Research

Many real-world decision making tasks require us to choose among several expensive observations. In a sensor network, for example, it is important to select the subset of sensors that is expected to provide the strongest reduction in uncertainty. In medical decision making tasks, one needs to select which tests to administer before deciding on the most effective treatment. It has been general practice to use heuristic-guided procedures for selecting observations. In this paper, we present the first efficient optimal algorithms for selecting observations for a class of probabilistic graphical models. For example, our algorithms allow to optimally label hidden variables in Hidden Markov Models (HMMs). We provide results for both selecting the optimal subset of observations, and for obtaining an optimal conditional observation plan. Furthermore we prove a surprising result: In most graphical models tasks, if one designs an efficient algorithm for chain graphs, such as HMMs, this procedure can be generalized to polytree graphical models. We prove that the optimizing value of information is $NP^{PP}$-hard even for polytrees. It also follows from our results that just computing decision theoretic value of information objective functions, which are commonly used in practice, is a #P-complete problem even on Naive Bayes models (a simple special case of polytrees). In addition, we consider several extensions, such as using our algorithms for scheduling observation selection for multiple sensors. We demonstrate the effectiveness of our approach on several real-world datasets, including a prototype sensor network deployment for energy conservation in buildings.


Graphical Probabilistic Routing Model for OBS Networks with Realistic Traffic Scenario

arXiv.org Artificial Intelligence

Burst contention is a well-known challenging problem in Optical Burst Switching (OBS) networks. Contention resolution approaches are always reactive and attempt to minimize the BLR based on local information available at the core node. On the other hand, a proactive approach that avoids burst losses before they occur is desirable. To reduce the probability of burst contention, a more robust routing algorithm than the shortest path is needed. This paper proposes a new routing mechanism for JET-based OBS networks, called Graphical Probabilistic Routing Model (GPRM) that selects less utilized links, on a hop-by-hop basis by using a bayesian network. We assume no wavelength conversion and no buffering to be available at the core nodes of the OBS network. We simulate the proposed approach under dynamic load to demonstrate that it reduces the Burst Loss Ratio (BLR) compared to static approaches by using Network Simulator 2 (ns-2) on NSFnet network topology and with realistic traffic matrix. Simulation results clearly show that the proposed approach outperforms static approaches in terms of BLR.


Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks

arXiv.org Machine Learning

We present a procedure for effective estimation of entropy and mutual information from small-sample data, and apply it to the problem of inferring high-dimensional gene association networks. Specifically, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropy-based gene-association network from gene expression data. A computer program is available that implements the proposed shrinkage estimator.


Efficient Markov Network Structure Discovery Using Independence Tests

Journal of Artificial Intelligence Research

We present two algorithms for learning the structure of a Markov network from data: GSMN* and GSIMN. Both algorithms use statistical independence tests to infer the structure by successively constraining the set of structures consistent with the results of these tests. Until very recently, algorithms for structure learning were based on maximum likelihood estimation, which has been proved to be NP-hard for Markov networks due to the difficulty of estimating the parameters of the network, needed for the computation of the data likelihood. The independence-based approach does not require the computation of the likelihood, and thus both GSMN* and GSIMN can compute the structure efficiently (as shown in our experiments). GSMN* is an adaptation of the Grow-Shrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN* by additionally exploiting Pearl's well-known properties of the conditional independence relation to infer novel independences from known ones, thus avoiding the performance of statistical tests to estimate them. To accomplish this efficiently GSIMN uses the Triangle theorem, also introduced in this work, which is a simplified version of the set of Markov axioms. Experimental comparisons on artificial and real-world data sets show GSIMN can yield significant savings with respect to GSMN*, while generating a Markov network with comparable or in some cases improved quality. We also compare GSIMN to a forward-chaining implementation, called GSIMN-FCH, that produces all possible conditional independences resulting from repeatedly applying Pearl's theorems on the known conditional independence tests. The results of this comparison show that GSIMN, by the sole use of the Triangle theorem, is nearly optimal in terms of the set of independences tests that it infers.


Leveraging Consensus and Divergence in Bayesian Belief Aggregation

AAAI Conferences

Many fields have a need to build representative or predictive models from a number of unique individuals who each can contribute their experience and beliefs to the whole. For instance, intelligence agencies may wish to build a model from a number of experts to analyze potential terrorist attacks. In addition, a sociological survey may want a model representing the beliefs of cultural or political groups. However, challenges remain that have limited the success of merging opinions to form consensus models. Our research in progress presents a new approach to combine, or aggregate the beliefs of many individuals using graphical models. Existing Bayesian belief aggregation methods utilize an opinion pool function to find a single consensus on a given probability distribution. These opinion pool functions have many theoretical problems including breaking several assumptions for Bayesian reasoning. More practically, existing opinion pool functions do not represent reality well, especially in cases of diverse opinions.


Not So Naive Online Bayesian Spam Filter

AAAI Conferences

Spam filtering, as a key problem in electronic communication, has drawn significant attention due to increasingly huge amounts of junk email on the Internet. Content-based filtering is one reliable method in combating with spammers' changing tactics. Naive Bayes (NB) is one of the earliest content-based machine learning methods both in theory and practice in combating with spammers, which is easy to implement while can achieve considerable accuracy. In this paper, the traditional online Bayesian classifier are enhanced  by two ways. First, from theory's point of view, we devise a self-adaptive mechanism to gradually weaken the assumption of independence required by original NB in the online training process, and as a result of that our NSNB is no longer ``naive''. Second, we propose other engineering ways to make the classifier more robust and accuracy. The experiment results show that our NSNB does give state-of-the-art classification performance on online spam filtering on large benchmark data sets while it is extremely fast and takes up little memory in comparison with other statistical methods.


Visualizing Topics with Multi-Word Expressions

arXiv.org Machine Learning

We describe a new method for visualizing topics, the distributions over terms that are automatically extracted from large text corpora using latent variable models. Our method finds significant $n$-grams related to a topic, which are then used to help understand and interpret the underlying distribution. Compared with the usual visualization, which simply lists the most probable topical terms, the multi-word expressions provide a better intuitive impression for what a topic is "about." Our approach is based on a language model of arbitrary length expressions, for which we develop a new methodology based on nested permutation tests to find significant phrases. We show that this method outperforms the more standard use of $\chi^2$ and likelihood ratio tests. We illustrate the topic presentations on corpora of scientific abstracts and news articles.


Bayesian Agglomerative Clustering with Coalescents

arXiv.org Machine Learning

We introduce a new Bayesian model for hierarchical clustering based on a prior over trees called Kingman's coalescent. We develop novel greedy and sequential Monte Carlo inferences which operate in a bottom-up agglomerative fashion. We show experimentally the superiority of our algorithms over others, and demonstrate our approach in document clustering and phylolinguistics.


Open Problems in Universal Induction & Intelligence

arXiv.org Artificial Intelligence

Specialized intelligent systems can be found everywhere: finger print, handwriting, speech, and face recognition, spam filtering, chess and other game programs, robots, et al. This decade the first presumably complete mathematical theory of artificial intelligence based on universal induction-prediction-decision-action has been proposed. This information-theoretic approach solidifies the foundations of inductive inference and artificial intelligence. Getting the foundations right usually marks a significant progress and maturing of a field. The theory provides a gold standard and guidance for researchers working on intelligent algorithms. The roots of universal induction have been laid exactly half-a-century ago and the roots of universal intelligence exactly one decade ago. So it is timely to take stock of what has been achieved and what remains to be done. Since there are already good recent surveys, I describe the state-of-the-art only in passing and refer the reader to the literature. This article concentrates on the open problems in universal induction and its extension to universal intelligence.