Bayesian Inference
Variational Program Inference
We introduce a framework for representing a variety of interesting problems as inference over the execution of probabilistic model programs. We represent a "solution" to such a problem as a guide program which runs alongside the model program and influences the model program's random choices, leading the model program to sample from a different distribution than from its priors. Ideally the guide program influences the model program to sample from the posteriors given the evidence. We show how the KL- divergence between the true posterior distribution and the distribution induced by the guided model program can be efficiently estimated (up to an additive constant) by sampling multiple executions of the guided model program. In addition, we show how to use the guide program as a proposal distribution in importance sampling to statistically prove lower bounds on the probability of the evidence and on the probability of a hypothesis and the evidence. We can use the quotient of these two bounds as an estimate of the conditional probability of the hypothesis given the evidence. We thus turn the inference problem into a heuristic search for better guide programs.
Using a Kernel Adatron for Object Classification with RCS Data
Byl, Marten F., Demers, James T., Rietman, Edward A.
Rapid identification of object from radar cross section (RCS) signals is important for many space and military applications. This identification is a problem in pattern recognition which either neural networks or support vector machines should prove to be high-speed. Bayesian networks would also provide value but require significant preprocessing of the signals. In this paper, we describe the use of a support vector machine for object identification from synthesized RCS data. Our best results are from data fusion of X-band and S-band signals, where we obtained 99.4%, 95.3%, 100% and 95.6% correct identification for cylinders, frusta, spheres, and polygons, respectively. We also compare our results with a Bayesian approach and show that the SVM is three orders of magnitude faster, as measured by the number of floating point operations.
Modeling Social Annotation: a Bayesian Approach
Plangprasopchok, Anon, Lerman, Kristina
Collaborative tagging systems, such as Delicious, CiteULike, and others, allow users to annotate resources, e.g., Web pages or scientific papers, with descriptive labels called tags. The social annotations contributed by thousands of users, can potentially be used to infer categorical knowledge, classify documents or recommend new relevant information. Traditional text inference methods do not make best use of social annotation, since they do not take into account variations in individual users' perspectives and vocabulary. In a previous work, we introduced a simple probabilistic model that takes interests of individual annotators into account in order to find hidden topics of annotated resources. Unfortunately, that approach had one major shortcoming: the number of topics and interests must be specified a priori. To address this drawback, we extend the model to a fully Bayesian framework, which offers a way to automatically estimate these numbers. In particular, the model allows the number of interests and topics to change as suggested by the structure of the data. We evaluate the proposed model in detail on the synthetic and real-world data by comparing its performance to Latent Dirichlet Allocation on the topic extraction task. For the latter evaluation, we apply the model to infer topics of Web resources from social annotations obtained from Delicious in order to discover new resources similar to a specified one. Our empirical results demonstrate that the proposed model is a promising method for exploiting social knowledge contained in user-generated annotations.
Some distance bounds of branching processes and their diffusion limits
Kammerer, Niels B., Stummer, Wolfgang
We compute exact values respectively bounds of "distances" - in the sense of (transforms of) power divergences and relative entropy - between two discrete-time Galton-Watson branching processes with immigration GWI for which the offspring as well as the immigration is arbitrarily Poisson-distributed (leading to arbitrary type of criticality). Implications for asymptotic distinguishability behaviour in terms of contiguity and entire separation of the involved GWI are given, too. Furthermore, we determine the corresponding limit quantities for the context in which the two GWI converge to Feller-type branching diffusion processes, as the time-lags between observations tend to zero. Some applications to (static random environment like) Bayesian decision making and Neyman-Pearson testing are presented as well.
Ecological non-linear state space model selection via adaptive particle Markov chain Monte Carlo (AdPMCMC)
Peters, Gareth W., Hosack, Geoff R., Hayes, Keith R.
We develop a novel advanced Particle Markov chain Monte Carlo algorithm that is capable of sampling from the posterior distribution of non-linear state space models for both the unobserved latent states and the unknown model parameters. We apply this novel methodology to five population growth models, including models with strong and weak Allee effects, and test if it can efficiently sample from the complex likelihood surface that is often associated with these models. Utilising real and also synthetically generated data sets we examine the extent to which observation noise and process error may frustrate efforts to choose between these models. Our novel algorithm involves an Adaptive Metropolis proposal combined with an SIR Particle MCMC algorithm (AdPMCMC). We show that the AdPMCMC algorithm samples complex, high-dimensional spaces efficiently, and is therefore superior to standard Gibbs or Metropolis Hastings algorithms that are known to converge very slowly when applied to the non-linear state space ecological models considered in this paper. Additionally, we show how the AdPMCMC algorithm can be used to recursively estimate the Bayesian Cram\'er-Rao Lower Bound of Tichavsk\'y (1998). We derive expressions for these Cram\'er-Rao Bounds and estimate them for the models considered. Our results demonstrate a number of important features of common population growth models, most notably their multi-modal posterior surfaces and dependence between the static and dynamic parameters. We conclude by sampling from the posterior distribution of each of the models, and use Bayes factors to highlight how observation noise significantly diminishes our ability to select among some of the models, particularly those that are designed to reproduce an Allee effect.
Reasoning about Deterministic Actions with Probabilistic Prior and Application to Stochastic Filtering
Hajishirzi, Hannaneh (University of Illinois at Urbana-Champaign) | Amir, Eyal (University of Illinois at Urbana-Champaign)
We present a novel algorithm and a new understanding of reasoning about a sequence of deterministic actions with a probabilistic prior. When the initial state of a dynamic system is unknown, a probability distribution can be still specified over the initial states. Estimating the posterior distribution over states filtering after some deterministic actions occurred is a problem relevant to AI planning, natural language processing (NLP), and robotics among others. Current approaches to filtering deterministic actions are not tractable even if the distribution over the initial system state is represented compactly. The reason is that state variables become correlated after a few steps. The main innovation in this paper is a method for sidestepping this problem by redefining state variables dynamically at each time step such that the posterior for time t is represented in a factored form. This update is done using a progression algorithm as a subroutine, and our algorithm's tractability follows when that subroutine is tractable. Our results are for general deterministic actions and in particular, our algorithm is tractable for one-to-one and STRIPS actions. We apply our reasoning algorithm about deterministic actions to reasoning about sequences of probabilistic actions and improve the efficiency of the current probabilistic reasoning approaches. We demonstrate the efficiency of the new algorithm empirically over AI-Planning data sets.
Active Learning for Hidden Attributes in Networks
Yan, Xiaoran, Zhu, Yaojia, Rouquier, Jean-Baptiste, Moore, Cristopher
In many networks, vertices have hidden attributes, or types, that are correlated with the networks topology. If the topology is known but these attributes are not, and if learning the attributes is costly, we need a method for choosing which vertex to query in order to learn as much as possible about the attributes of the other vertices. We assume the network is generated by a stochastic block model, but we make no assumptions about its assortativity or disassortativity. We choose which vertex to query using two methods: 1) maximizing the mutual information between its attributes and those of the others (a well-known approach in active learning) and 2) maximizing the average agreement between two independent samples of the conditional Gibbs distribution. Experimental results show that both these methods do much better than simple heuristics. They also consistently identify certain vertices as important by querying them early on.
The Production of Probabilistic Entropy in Structure/Action Contingency Relations
Luhmann (1984) defined society as a communication system which is structurally coupled to, but not an aggregate of, human action systems. The communication system is then considered as self-organizing ("autopoietic"), as are human actors. Communication systems can be studied by using Shannon's (1948) mathematical theory of communication. The update of a network by action at one of the local nodes is then a well-known problem in artificial intelligence (Pearl 1988). By combining these various theories, a general algorithm for probabilistic structure/action contingency can be derived. The consequences of this contingency for each system, its consequences for their further histories, and the stabilization on each side by counterbalancing mechanisms are discussed, in both mathematical and theoretical terms. An empirical example is elaborated.
Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data
Shah, Mohak, Marchand, Mario, Corbeil, Jacques
One of the objectives of designing feature selection learning algorithms is to obtain classifiers that depend on a small number of attributes and have verifiable future performance guarantees. There are few, if any, approaches that successfully address the two goals simultaneously. Performance guarantees become crucial for tasks such as microarray data analysis due to very small sample sizes resulting in limited empirical evaluation. To the best of our knowledge, such algorithms that give theoretical bounds on the future performance have not been proposed so far in the context of the classification of gene expression data. In this work, we investigate the premise of learning a conjunction (or disjunction) of decision stumps in Occam's Razor, Sample Compression, and PAC-Bayes learning settings for identifying a small subset of attributes that can be used to perform reliable classification tasks. We apply the proposed approaches for gene identification from DNA microarray data and compare our results to those of well known successful approaches proposed for the task. We show that our algorithm not only finds hypotheses with much smaller number of genes while giving competitive classification accuracy but also have tight risk guarantees on future performance unlike other approaches. The proposed approaches are general and extensible in terms of both designing novel algorithms and application to other domains.
Planning for Concurrent Action Executions Under Action Duration Uncertainty Using Dynamically Generated Bayesian Networks
Beaudry, Eric (Universite de Sherbrooke) | Kabanza, Froduald (Universite de Sherbrooke) | Michaud, Francois (Universite de Sherbrooke)
An interesting class of planning domains, including planning for daily activities of Mars rovers, involves achievement of goals with time constraints and concurrent actions with probabilistic durations. Current probabilistic approaches, which rely on a discrete time model, introduce a blow up in the search state-space when the two factors of action concurrency and action duration uncertainty are combined. Simulation-based and sampling probabilistic planning approaches would cope with this state explosion by avoiding storing all the explored states in memory, but they remain approximate solution approaches. In this paper, we present an alternative approach relying on a continuous time model which avoids the state explosion caused by time stamping in the presence of action concurrency and action duration uncertainty. Time is represented as a continuous random variable. The dependency between state time variables is conveyed by a Bayesian network, which is dynamically generated by a state-based forward-chaining search based on the action descriptions. A generated plan is characterized by a probability of satisfying a goal. The evaluation of this probability is done by making a query the Bayesian network.