Directed Networks
Nonparametric Bayesian Models for Unsupervised Event Coreference Resolution
Bejan, Cosmin, Titsworth, Matthew, Hickl, Andrew, Harabagiu, Sanda
We present a sequence of unsupervised, nonparametric Bayesian models for clustering complex linguistic objects. In this approach, we consider a potentially infinite number of features and categorical outcomes. We evaluate these models for the task of within- and cross-document event coreference on two corpora. All the models we investigated show significant improvements when compared against an existing baseline for this task.
MedLDA: A General Framework of Maximum Margin Supervised Topic Models
Zhu, Jun, Ahmed, Amr, Xing, Eric P.
Supervised topic models utilize document's side information for discovering predictive low dimensional representations of documents. Existing models apply the likelihood-based estimation. In this paper, we present a general framework of max-margin supervised topic models for both continuous and categorical response variables. Our approach, the maximum entropy discrimination latent Dirichlet allocation (MedLDA), utilizes the max-margin principle to train supervised topic models and estimate predictive topic representations that are arguably more suitable for prediction tasks. The general principle of MedLDA can be applied to perform joint max-margin learning and maximum likelihood estimation for arbitrary topic models, directed or undirected, and supervised or unsupervised, when the supervised side information is available. We develop efficient variational methods for posterior inference and parameter estimation, and demonstrate qualitatively and quantitatively the advantages of MedLDA over likelihood-based topic models on movie review and 20 Newsgroups data sets.
A survey of statistical network models
Goldenberg, Anna, Zheng, Alice X, Fienberg, Stephen E, Airoldi, Edoardo M
Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.
Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning
There has been a lot of recent work on Bayesian methods for reinforcement learning exhibiting near-optimal online performance. The main obstacle facing such methods is that in most problems of interest, the optimal solution involves planning in an infinitely large tree. However, it is possible to obtain stochastic lower and upper bounds on the value of each tree node. This enables us to use stochastic branch and bound algorithms to search the tree efficiently. This paper proposes two such algorithms and examines their complexity in this setting.
On Finding Predictors for Arbitrary Families of Processes
The problem is sequence prediction in the following setting. A sequence $x_1,...,x_n,...$ of discrete-valued observations is generated according to some unknown probabilistic law (measure) $\mu$. After observing each outcome, it is required to give the conditional probabilities of the next observation. The measure $\mu$ belongs to an arbitrary but known class $C$ of stochastic process measures. We are interested in predictors $\rho$ whose conditional probabilities converge (in some sense) to the "true" $\mu$-conditional probabilities if any $\mu\in C$ is chosen to generate the sequence. The contribution of this work is in characterizing the families $C$ for which such predictors exist, and in providing a specific and simple form in which to look for a solution. We show that if any predictor works, then there exists a Bayesian predictor, whose prior is discrete, and which works too. We also find several sufficient and necessary conditions for the existence of a predictor, in terms of topological characterizations of the family $C$, as well as in terms of local behaviour of the measures in $C$, which in some cases lead to procedures for constructing such predictors. It should be emphasized that the framework is completely general: the stochastic processes considered are not required to be i.i.d., stationary, or to belong to any parametric or countable family.
Closing the Learning-Planning Loop with Predictive State Representations
Boots, Byron, Siddiqi, Sajid M., Gordon, Geoffrey J.
A central problem in artificial intelligence is that of planning to maximize future reward under uncertainty in a partially observable environment. In this paper we propose and demonstrate a novel algorithm which accurately learns a model of such an environment directly from sequences of action-observation pairs. We then close the loop from observations to actions by planning in the learned model and recovering a policy which is near-optimal in the original environment. Specifically, we present an efficient and statistically consistent spectral algorithm for learning the parameters of a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated high-dimensional, vision-based mobile robot planning task, and then perform approximate point-based planning in the learned PSR. Analysis of our results shows that the algorithm learns a state space which efficiently captures the essential features of the environment. This representation allows accurate prediction with a small number of parameters, and enables successful and efficient planning.
The Cultural Geography Model: An Agent Based Modeling Framework for Analysis of the Impact of Culture in Irregular Warfare
Alt, Jon (U.S. Army Training and Doctrine Command Analysis Center) | Lieberman, Stephen T. (U.S. Army Training and Doctrine Command Analysis Center)
The development of tools to provide insight into the behavioral response of a civilian population will greatly benefit the modeling and simulation community and have potential applications across multiple user communities in the U.S. Department of Defense. We present an overview of a modular agent-based modeling framework, grounded in the human behavioral and social theory, which is intended to represent a populations’ stance on issues as a function of their changing beliefs, values and interests. We utilize and integrate theories of narrative identity [1] and planned behavior [2] with macrosociological theories of heterogeneity and influence [3][4] to model civilian behavior in a conflict ecosystem. Communication between agents takes place across a social network developed using real data about the population under consideration, and essential services are implemented as objects within the model allowing for experimentation with different courses of action for development of civil service capacity. We describe the theoretical underpinnings of the model, the current state of implementation, potential use cases, and the path forward for future work.
How to Explain Individual Classification Decisions
Baehrens, David, Schroeter, Timon, Harmeling, Stefan, Kawanabe, Motoaki, Hansen, Katja, Mueller, Klaus-Robert
After building a classifier with modern tools of machine learning we typically have a black box at hand that is able to predict well for unseen data. Thus, we get an answer to the question what is the most likely label of a given unseen data point. However, most methods will provide no answer why the model predicted the particular label for a single instance and what features were most influential for that particular instance. The only method that is currently able to provide such explanations are decision trees. This paper proposes a procedure which (based on a set of assumptions) allows to explain the decisions of any classification method.
`Plausibilities of plausibilities': an approach through circumstances
Mana, P. G. L. Porta, Månsson, A., Björk, G.
Probability-like parameters appearing in some statistical models, and their prior distributions, are reinterpreted through the notion of `circumstance', a term which stands for any piece of knowledge that is useful in assigning a probability and that satisfies some additional logical properties. The idea, which can be traced to Laplace and Jaynes, is that the usual inferential reasonings about the probability-like parameters of a statistical model can be conceived as reasonings about equivalence classes of `circumstances' - viz., real or hypothetical pieces of knowledge, like e.g. physical hypotheses, that are useful in assigning a probability and satisfy some additional logical properties - that are uniquely indexed by the probability distributions they lead to.
The Laplace-Jaynes approach to induction
Mana, P. G. L. Porta, Månsson, A., Björk, G.
An approach to induction is presented, based on the idea of analysing the context of a given problem into `circumstances'. This approach, fully Bayesian in form and meaning, provides a complement or in some cases an alternative to that based on de Finetti's representation theorem and on the notion of infinite exchangeability. In particular, it gives an alternative interpretation of those formulae that apparently involve `unknown probabilities' or `propensities'. Various advantages and applications of the presented approach are discussed, especially in comparison to that based on exchangeability. Generalisations are also discussed.