Learning Graphical Models
Closing the Learning-Planning Loop with Predictive State Representations
Boots, Byron, Siddiqi, Sajid M., Gordon, Geoffrey J.
A central problem in artificial intelligence is that of planning to maximize future reward under uncertainty in a partially observable environment. In this paper we propose and demonstrate a novel algorithm which accurately learns a model of such an environment directly from sequences of action-observation pairs. We then close the loop from observations to actions by planning in the learned model and recovering a policy which is near-optimal in the original environment. Specifically, we present an efficient and statistically consistent spectral algorithm for learning the parameters of a Predictive State Representation (PSR). We demonstrate the algorithm by learning a model of a simulated high-dimensional, vision-based mobile robot planning task, and then perform approximate point-based planning in the learned PSR. Analysis of our results shows that the algorithm learns a state space which efficiently captures the essential features of the environment. This representation allows accurate prediction with a small number of parameters, and enables successful and efficient planning.
The Cultural Geography Model: An Agent Based Modeling Framework for Analysis of the Impact of Culture in Irregular Warfare
Alt, Jon (U.S. Army Training and Doctrine Command Analysis Center) | Lieberman, Stephen T. (U.S. Army Training and Doctrine Command Analysis Center)
The development of tools to provide insight into the behavioral response of a civilian population will greatly benefit the modeling and simulation community and have potential applications across multiple user communities in the U.S. Department of Defense. We present an overview of a modular agent-based modeling framework, grounded in the human behavioral and social theory, which is intended to represent a populations’ stance on issues as a function of their changing beliefs, values and interests. We utilize and integrate theories of narrative identity [1] and planned behavior [2] with macrosociological theories of heterogeneity and influence [3][4] to model civilian behavior in a conflict ecosystem. Communication between agents takes place across a social network developed using real data about the population under consideration, and essential services are implemented as objects within the model allowing for experimentation with different courses of action for development of civil service capacity. We describe the theoretical underpinnings of the model, the current state of implementation, potential use cases, and the path forward for future work.
How to Explain Individual Classification Decisions
Baehrens, David, Schroeter, Timon, Harmeling, Stefan, Kawanabe, Motoaki, Hansen, Katja, Mueller, Klaus-Robert
After building a classifier with modern tools of machine learning we typically have a black box at hand that is able to predict well for unseen data. Thus, we get an answer to the question what is the most likely label of a given unseen data point. However, most methods will provide no answer why the model predicted the particular label for a single instance and what features were most influential for that particular instance. The only method that is currently able to provide such explanations are decision trees. This paper proposes a procedure which (based on a set of assumptions) allows to explain the decisions of any classification method.
Positive Definite Kernels in Machine Learning
This survey is an introduction to positive definite kernels and the set of methods they have inspired in the machine learning literature, namely kernel methods. We first discuss some properties of positive definite kernels as well as reproducing kernel Hibert spaces, the natural extension of the set of functions $\{k(x,\cdot),x\in\mathcal{X}\}$ associated with a kernel $k$ defined on a space $\mathcal{X}$. We discuss at length the construction of kernel functions that take advantage of well-known statistical models. We provide an overview of numerous data-analysis methods which take advantage of reproducing kernel Hilbert spaces and discuss the idea of combining several kernels to improve the performance on certain tasks. We also provide a short cookbook of different kernels which are particularly useful for certain data-types such as images, graphs or speech segments.
`Plausibilities of plausibilities': an approach through circumstances
Mana, P. G. L. Porta, Månsson, A., Björk, G.
Probability-like parameters appearing in some statistical models, and their prior distributions, are reinterpreted through the notion of `circumstance', a term which stands for any piece of knowledge that is useful in assigning a probability and that satisfies some additional logical properties. The idea, which can be traced to Laplace and Jaynes, is that the usual inferential reasonings about the probability-like parameters of a statistical model can be conceived as reasonings about equivalence classes of `circumstances' - viz., real or hypothetical pieces of knowledge, like e.g. physical hypotheses, that are useful in assigning a probability and satisfy some additional logical properties - that are uniquely indexed by the probability distributions they lead to.
The Laplace-Jaynes approach to induction
Mana, P. G. L. Porta, Månsson, A., Björk, G.
An approach to induction is presented, based on the idea of analysing the context of a given problem into `circumstances'. This approach, fully Bayesian in form and meaning, provides a complement or in some cases an alternative to that based on de Finetti's representation theorem and on the notion of infinite exchangeability. In particular, it gives an alternative interpretation of those formulae that apparently involve `unknown probabilities' or `propensities'. Various advantages and applications of the presented approach are discussed, especially in comparison to that based on exchangeability. Generalisations are also discussed.
Topology Induced Coarsening in Language Games
Baronchelli, A., Dall'Asta, L., Barrat, A., Loreto, V.
We investigate how very large populations are able to reach a global consensus, out of local "microscopic" interaction rules, in the framework of a recently introduced class of models of semiotic dynamics, the so-called Naming Game. We compare in particular the convergence mechanism for interacting agents embedded in a low-dimensional lattice with respect to the mean-field case. We highlight that in low-dimensions consensus is reached through a coarsening process which requires less cognitive effort of the agents, with respect to the mean-field case, but takes longer to complete. In 1-d the dynamics of the boundaries is mapped onto a truncated Markov process from which we analytically computed the diffusion coefficient. More generally we show that the convergence process requires a memory per agent scaling as N and lasts a time N^{1+2/d} in dimension d<5 (d=4 being the upper critical dimension), while in mean-field both memory and time scale as N^{3/2}, for a population of N agents. We present analytical and numerical evidences supporting this picture.
A Novel Bayesian Classifier using Copula Functions
Pattern classification is an important task in several image processing, statistical learning, and data mining applications. The most popular pattern classifiers are Bayesian classifiers. There are many well known methods for represent ing Bayesian classifiers, but one of the most useful method is by discriminant functions . These functions provide inter-class decision surfaces for Bayesian classifier s. Discriminant functions assume several forms depending on the probability density of the feature space. But most attention has been received by discriminant functions that assume multivariate Gaussian distribution [1].
Nonlinear Estimators and Tail Bounds for Dimension Reduction in $l_1$ Using Cauchy Random Projections
Li, Ping, Hastie, Trevor J., Church, Kenneth W.
For dimension reduction in $l_1$, the method of {\em Cauchy random projections} multiplies the original data matrix $\mathbf{A} \in\mathbb{R}^{n\times D}$ with a random matrix $\mathbf{R} \in \mathbb{R}^{D\times k}$ ($k\ll\min(n,D)$) whose entries are i.i.d. samples of the standard Cauchy C(0,1). Because of the impossibility results, one can not hope to recover the pairwise $l_1$ distances in $\mathbf{A}$ from $\mathbf{B} = \mathbf{AR} \in \mathbb{R}^{n\times k}$, using linear estimators without incurring large errors. However, nonlinear estimators are still useful for certain applications in data stream computation, information retrieval, learning, and data mining. We propose three types of nonlinear estimators: the bias-corrected sample median estimator, the bias-corrected geometric mean estimator, and the bias-corrected maximum likelihood estimator. The sample median estimator and the geometric mean estimator are asymptotically (as $k\to \infty$) equivalent but the latter is more accurate at small $k$. We derive explicit tail bounds for the geometric mean estimator and establish an analog of the Johnson-Lindenstrauss (JL) lemma for dimension reduction in $l_1$, which is weaker than the classical JL lemma for dimension reduction in $l_2$. Asymptotically, both the sample median estimator and the geometric mean estimators are about 80% efficient compared to the maximum likelihood estimator (MLE). We analyze the moments of the MLE and propose approximating the distribution of the MLE by an inverse Gaussian.