When operating in stochastic, partially observable, multiagent settings, it is crucial to accurately predict the actions of other agents. In my thesis work, I propose methodologies for learning the policy of external agents from their observed behavior, in the form of finite state controllers. To perform this task, I adopt Bayesian learning algorithms based on nonparametric prior distributions, that provide the flexibility required to infer models of unknown complexity. These methods are to be embedded in decision making frameworks for autonomous planning in partially observable multiagent systems.
We analyze the asymptotic behavior of agents engaged in an infinite horizon partially observable stochastic game as formalized by the interactive POMDP framework. We show that when agents' initial beliefs satisfy a truth compatibility condition, their behavior converges to a subjective ɛ-equilibrium in a finite time, and subjective equilibrium in the limit. This result is a generalization of a similar result in repeated games, to partially observable stochastic games. However, it turns out that the equilibrating process is difficult to demonstrate computationally because of the difficulty in coming up with initial beliefs that are both natural and satisfy the truth compatibility condition. Our results, therefore, shed some negative light on using equilibria as a solution concept for decision making in partially observable stochastic games.
In an earlier paper, a new theory of measurefree "conditional" objects was presented. In this paper, emphasis is placed upon the motivation of the theory. The central part of this motivation is established through an example involving a knowledge-based system. In order to evaluate combination of evidence for this system, using observed data, auxiliary at tribute and diagnosis variables, and inference rules connecting them, one must first choose an appropriate algebraic logic description pair (ALDP): a formal language or syntax followed by a compatible logic or semantic evaluation (or model). Three common choices- for this highly non-unique choice - are briefly discussed, the logics being Classical Logic, Fuzzy Logic, and Probability Logic. In all three,the key operator representing implication for the inference rules is interpreted as the often-used disjunction of a negation (b => a) = (b'v a), for any events a,b. However, another reasonable interpretation of the implication operator is through the familiar form of probabilistic conditioning. But, it can be shown - quite surprisingly - that the ALDP corresponding to Probability Logic cannot be used as a rigorous basis for this interpretation! To fill this gap, a new ALDP is constructed consisting of "conditional objects", extending ordinary Probability Logic, and compatible with the desired conditional probability interpretation of inference rules. It is shown also that this choice of ALDP leads to feasible computations for the combination of evidence evaluation in the example. In addition, a number of basic properties of conditional objects and the resulting Conditional Probability Logic are given, including a characterization property and a developed calculus of relations.
We present a theoretical analysis of Gaussian-binary restricted Boltzmann machines (GRBMs) from the perspective of density models. The key aspect of this analysis is to show that GRBMs can be formulated as a constrained mixture of Gaussians, which gives a much better insight into the model's capabilities and limitations. We show that GRBMs are capable of learning meaningful features both in a two-dimensional blind source separation task and in modeling natural images. Further, we show that reported difficulties in training GRBMs are due to the failure of the training algorithm rather than the model itself. Based on our analysis we are able to propose several training recipes, which allowed successful and fast training in our experiments. Finally, we discuss the relationship of GRBMs to several modifications that have been proposed to improve the model.
We develop a principled strategy to sample a function optimally for function approximation tasks within a Bayesian framework. Using ideas from optimal experiment design, we introduce an objective function (incorporating both bias and variance) to measure the degree ofapproximation, and the potential utility of the data points towards optimizing this objective. We show how the general strategy canbe used to derive precise algorithms to select data for two cases: learning unit step functions and polynomial functions. In particular, we investigate whether such active algorithms can learn the target with fewer examples. We obtain theoretical and empirical resultsto suggest that this is the case. 1 INTRODUCTION AND MOTIVATION Learning from examples is a common supervised learning paradigm that hypothesizes atarget concept given a stream of training examples that describes the concept. In function approximation, example-based learning can be formulated as synthesizing anapproximation function for data sampled from an unknown target function (Poggio and Girosi, 1990). Active learning describes a class of example-based learning paradigms that seeks out new training examples from specific regions of the input space, instead of passively accepting examples from some data generating source. By judiciously selecting ex- 594 KahKay Sung, Parlha Niyogi amples instead of allowing for possible random sampling, active learning techniques can conceivably have faster learning rates and better approximation results than passive learning methods. This paper presents a Bayesian formulation for active learning within the function approximation framework.