Bayesian Inference
A survey of statistical network models
Goldenberg, Anna, Zheng, Alice X, Fienberg, Stephen E, Airoldi, Edoardo M
Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.
On Finding Predictors for Arbitrary Families of Processes
The problem is sequence prediction in the following setting. A sequence $x_1,...,x_n,...$ of discrete-valued observations is generated according to some unknown probabilistic law (measure) $\mu$. After observing each outcome, it is required to give the conditional probabilities of the next observation. The measure $\mu$ belongs to an arbitrary but known class $C$ of stochastic process measures. We are interested in predictors $\rho$ whose conditional probabilities converge (in some sense) to the "true" $\mu$-conditional probabilities if any $\mu\in C$ is chosen to generate the sequence. The contribution of this work is in characterizing the families $C$ for which such predictors exist, and in providing a specific and simple form in which to look for a solution. We show that if any predictor works, then there exists a Bayesian predictor, whose prior is discrete, and which works too. We also find several sufficient and necessary conditions for the existence of a predictor, in terms of topological characterizations of the family $C$, as well as in terms of local behaviour of the measures in $C$, which in some cases lead to procedures for constructing such predictors. It should be emphasized that the framework is completely general: the stochastic processes considered are not required to be i.i.d., stationary, or to belong to any parametric or countable family.
The Cultural Geography Model: An Agent Based Modeling Framework for Analysis of the Impact of Culture in Irregular Warfare
Alt, Jon (U.S. Army Training and Doctrine Command Analysis Center) | Lieberman, Stephen T. (U.S. Army Training and Doctrine Command Analysis Center)
The development of tools to provide insight into the behavioral response of a civilian population will greatly benefit the modeling and simulation community and have potential applications across multiple user communities in the U.S. Department of Defense. We present an overview of a modular agent-based modeling framework, grounded in the human behavioral and social theory, which is intended to represent a populationsโ stance on issues as a function of their changing beliefs, values and interests. We utilize and integrate theories of narrative identity [1] and planned behavior [2] with macrosociological theories of heterogeneity and influence [3][4] to model civilian behavior in a conflict ecosystem. Communication between agents takes place across a social network developed using real data about the population under consideration, and essential services are implemented as objects within the model allowing for experimentation with different courses of action for development of civil service capacity. We describe the theoretical underpinnings of the model, the current state of implementation, potential use cases, and the path forward for future work.
How to Explain Individual Classification Decisions
Baehrens, David, Schroeter, Timon, Harmeling, Stefan, Kawanabe, Motoaki, Hansen, Katja, Mueller, Klaus-Robert
After building a classifier with modern tools of machine learning we typically have a black box at hand that is able to predict well for unseen data. Thus, we get an answer to the question what is the most likely label of a given unseen data point. However, most methods will provide no answer why the model predicted the particular label for a single instance and what features were most influential for that particular instance. The only method that is currently able to provide such explanations are decision trees. This paper proposes a procedure which (based on a set of assumptions) allows to explain the decisions of any classification method.
`Plausibilities of plausibilities': an approach through circumstances
Mana, P. G. L. Porta, Mรฅnsson, A., Bjรถrk, G.
Probability-like parameters appearing in some statistical models, and their prior distributions, are reinterpreted through the notion of `circumstance', a term which stands for any piece of knowledge that is useful in assigning a probability and that satisfies some additional logical properties. The idea, which can be traced to Laplace and Jaynes, is that the usual inferential reasonings about the probability-like parameters of a statistical model can be conceived as reasonings about equivalence classes of `circumstances' - viz., real or hypothetical pieces of knowledge, like e.g. physical hypotheses, that are useful in assigning a probability and satisfy some additional logical properties - that are uniquely indexed by the probability distributions they lead to.
The Laplace-Jaynes approach to induction
Mana, P. G. L. Porta, Mรฅnsson, A., Bjรถrk, G.
An approach to induction is presented, based on the idea of analysing the context of a given problem into `circumstances'. This approach, fully Bayesian in form and meaning, provides a complement or in some cases an alternative to that based on de Finetti's representation theorem and on the notion of infinite exchangeability. In particular, it gives an alternative interpretation of those formulae that apparently involve `unknown probabilities' or `propensities'. Various advantages and applications of the presented approach are discussed, especially in comparison to that based on exchangeability. Generalisations are also discussed.
Nonlinear Estimators and Tail Bounds for Dimension Reduction in $l_1$ Using Cauchy Random Projections
Li, Ping, Hastie, Trevor J., Church, Kenneth W.
For dimension reduction in $l_1$, the method of {\em Cauchy random projections} multiplies the original data matrix $\mathbf{A} \in\mathbb{R}^{n\times D}$ with a random matrix $\mathbf{R} \in \mathbb{R}^{D\times k}$ ($k\ll\min(n,D)$) whose entries are i.i.d. samples of the standard Cauchy C(0,1). Because of the impossibility results, one can not hope to recover the pairwise $l_1$ distances in $\mathbf{A}$ from $\mathbf{B} = \mathbf{AR} \in \mathbb{R}^{n\times k}$, using linear estimators without incurring large errors. However, nonlinear estimators are still useful for certain applications in data stream computation, information retrieval, learning, and data mining. We propose three types of nonlinear estimators: the bias-corrected sample median estimator, the bias-corrected geometric mean estimator, and the bias-corrected maximum likelihood estimator. The sample median estimator and the geometric mean estimator are asymptotically (as $k\to \infty$) equivalent but the latter is more accurate at small $k$. We derive explicit tail bounds for the geometric mean estimator and establish an analog of the Johnson-Lindenstrauss (JL) lemma for dimension reduction in $l_1$, which is weaker than the classical JL lemma for dimension reduction in $l_2$. Asymptotically, both the sample median estimator and the geometric mean estimators are about 80% efficient compared to the maximum likelihood estimator (MLE). We analyze the moments of the MLE and propose approximating the distribution of the MLE by an inverse Gaussian.
When Ignorance is Bliss
Grunwald, Peter D., Halpern, Joseph Y.
It is commonly-accepted wisdom that more information is better, and that information should never be ignored. Here we argue, using both a Bayesian and a non-Bayesian analysis, that in some situations you are better off ignoring information if your uncertainty is represented by a set of probability measures. These include situations in which the information is relevant for the prediction task at hand. In the non-Bayesian analysis, we show how ignoring information avoids dilation, the phenomenon that additional pieces of information sometimes lead to an increase in uncertainty. In the Bayesian analysis, we show that for small sample sizes and certain prediction tasks, the Bayesian posterior based on a non-informative prior yields worse predictions than simply ignoring the given information.
Universal Algorithmic Intelligence: A mathematical top->down approach
Sequential decision theory formally solves the problem of rational agents in uncertain worlds if the true environmental prior probability distribution is known. Solomonoff's theory of universal induction formally solves the problem of sequence prediction for unknown prior distribution. We combine both ideas and get a parameter-free theory of universal Artificial Intelligence. We give strong arguments that the resulting AIXI model is the most intelligent unbiased agent possible. We outline how the AIXI model can formally solve a number of problem classes, including sequence prediction, strategic games, function minimization, reinforcement and supervised learning. The major drawback of the AIXI model is that it is uncomputable. To overcome this problem, we construct a modified algorithm AIXItl that is still effectively more intelligent than any other time t and length l bounded agent. The computation time of AIXItl is of the order t x 2^l. The discussion includes formal definitions of intelligence order relations, the horizon problem and relations of the AIXI theory to other AI approaches.