Goto

Collaborating Authors

 Uncertainty


Pointwise adaptation via stagewise aggregation of local estimates for multiclass classification

arXiv.org Machine Learning

We consider a problem of multiclass classification, where the training sample $S_n = \{(X_i, Y_i)\}_{i=1}^n$ is generated from the model $\mathbb p(Y = m | X = x) = \theta_m(x)$, $1 \leq m \leq M$, and $\theta_1(x), \dots, \theta_M(x)$ are unknown Lipschitz functions. Given a test point $X$, our goal is to estimate $\theta_1(X), \dots, \theta_M(X)$. An approach based on nonparametric smoothing uses a localization technique, i.e. the weight of observation $(X_i, Y_i)$ depends on the distance between $X_i$ and $X$. However, local estimates strongly depend on localizing scheme. In our solution we fix several schemes $W_1, \dots, W_K$, compute corresponding local estimates $\widetilde\theta^{(1)}, \dots, \widetilde\theta^{(K)}$ for each of them and apply an aggregation procedure. We propose an algorithm, which constructs a convex combination of the estimates $\widetilde\theta^{(1)}, \dots, \widetilde\theta^{(K)}$ such that the aggregated estimate behaves approximately as well as the best one from the collection $\widetilde\theta^{(1)}, \dots, \widetilde\theta^{(K)}$. We also study theoretical properties of the procedure, prove oracle results and establish rates of convergence under mild assumptions.


Fast Conditional Independence Test for Vector Variables with Large Sample Sizes

arXiv.org Machine Learning

We present and evaluate the Fast (conditional) Independence Test (FIT) -- a nonparametric conditional independence test. The test is based on the idea that when $P(X \mid Y, Z) = P(X \mid Y)$, $Z$ is not useful as a feature to predict $X$, as long as $Y$ is also a regressor. On the contrary, if $P(X \mid Y, Z) \neq P(X \mid Y)$, $Z$ might improve prediction results. FIT applies to thousand-dimensional random variables with a hundred thousand samples in a fraction of the time required by alternative methods. We provide an extensive evaluation that compares FIT to six extant nonparametric independence tests. The evaluation shows that FIT has low probability of making both Type I and Type II errors compared to other tests, especially as the number of available samples grows. Our implementation of FIT is publicly available.


Lifted Stochastic Planning, Belief Propagation and Marginal MAP

AAAI Conferences

It is well known that the problems of stochastic planning and probabilistic inference are closely related. This paper makes several contributions in this context for factored spaces where the complexity of solutions is challenging. First, we analyze the recently developed SOGBOFA heuristic, which performs stochastic planning by building an explicit computation graph capturing an approximate aggregate simulation of the dynamics. It is shown that the values computed by this algorithm are identical to the approximation provided by Belief Propagation (BP). Second, as a consequence of this observation, we show how ideas on lifted BP can be used to develop a lifted version of SOGBOFA. Unlike implementations of lifted BP, Lifted SOGBOFA has a very simple implementation as a dynamic programming version of the original graph construction. Third, we show that the idea of graph construction for aggregate simulation can be used to solve marginal MAP (MMAP) problems in Bayesian networks, where MAP variables are constrained to be at roots of the network. This yields a novel algorithm for MMAP for this subclass. An experimental evaluation illustrates the advantage of Lifted SOGBOFA for planning.


Generalized Dual Decomposition for Bounding Maximum Expected Utility of Influence Diagrams with Perfect Recall

AAAI Conferences

We introduce a generalized dual decomposition bound for computing the maximum expected utility of influence diagrams based on the dual decomposition method generalized to $L^p$ space.ย  The main goal is to devise an approximation scheme free from translations required by existing variational approaches while exploiting the local structure of sum of utility functions as well as the conditional independence of probability functions.ย  In this work, the generalized dual decomposition method is applied to the algebraic framework called valuation algebra for influence diagrams which handles probability and expected utility as a pair. The proposed approach allows a sequential decision problem to be decomposed as a collection of sub-decision problems of bounded complexity and the upper bound of maximum expected utility to be computed by combining the local expected utilities. Thus, it has a flexible control of space and time complexity for computing the bound.ย  In addition, the upper bounds can be further minimized by reparameterizing the utility functions. Since the global objective function for the minimization is nonconvex, we present a gradient-based local search algorithm in which the outer loop controls the randomization of the initial configurations and the inner loop tightens the upper-bound based on block coordinate descent with gradients perturbed by a random noise. The experimental evaluation demonstrates highlights of the proposed approach on finite horizon MDP/POMDP instances.


Preventing Infectious Disease in Dynamic Populations under Uncertainty

AAAI Conferences

Treatable infectious diseases are a critical challenge for public health. Outreach campaigns can encourage undiagnosed patients to seek treatment but must be carefully targeted to make the most efficient use of limited resources. We present an algorithm to optimally allocate limited outreach resources among demographic groups in the population. The algorithm uses a novel multiagent model of disease spread which both captures the underlying population dynamics and is amenable to optimization. Our algorithm extends, with provable guarantees, to a stochastic setting where we have only a distribution over parameters such as the contact pattern between agents. We evaluate our algorithm on two instances where this distribution is inferred from real world data: tuberculosis in India and gonorrhea in the United States. Our algorithm produces a policy which is predicted to avert an average of least 8,000 person-years of tuberculosis and 20,000 person-years of gonorrhea annually compared to current policy.


Formal Ways for Measuring Relations between Concepts in Conceptual Spaces

arXiv.org Artificial Intelligence

The highly influential framework of conceptual spaces provides a geometric way of representing knowledge. Instances are represented by points in a high-dimensional space and concepts are represented by regions in this space. In this article, we extend our recent mathematical formalization of this framework by providing quantitative mathematical definitions for measuring relations between concepts: We develop formal ways for computing concept size, subsethood, implication, similarity, and betweenness. This considerably increases the representational capabilities of our formalization and makes it the most thorough and comprehensive formalization of conceptual spaces developed so far.


Variational Rejection Sampling

arXiv.org Machine Learning

Learning latent variable models with stochastic variational inference is challenging when the approximate posterior is far from the true posterior, due to high variance in the gradient estimates. We propose a novel rejection sampling step that discards samples from the variational posterior which are assigned low likelihoods by the model. Our approach provides an arbitrarily accurate approximation of the true posterior at the expense of extra computation. Using a new gradient estimator for the resulting unnormalized proposal distribution, we achieve average improvements of 3.71 nats and 0.21 nats over state-of-the-art single-sample and multi-sample alternatives respectively for estimating marginal log-likelihoods using sigmoid belief networks on the MNIST dataset.


The Kanerva Machine: A Generative Distributed Memory

arXiv.org Machine Learning

We present an end-to-end trained memory system that quickly adapts to new data and generates samples like them. Inspired by Kanerva's sparse distributed memory, it has a robust distributed reading and writing mechanism. The memory is analytically tractable, which enables optimal on-line compression via a Bayesian update-rule. We formulate it as a hierarchical conditional generative model, where memory provides a rich data-dependent prior distribution. Consequently, the top-down memory and bottom-up perception are combined to produce the code representing an observation. Empirically, we demonstrate that the adaptive memory significantly improves generative models trained on both the Omniglot and CIFAR datasets. Compared with the Differentiable Neural Computer (DNC) and its variants, our memory model has greater capacity and is significantly easier to train.


Information Maximizing Exploration with a Latent Dynamics Model

arXiv.org Machine Learning

All reinforcement learning algorithms must handle the trade-off between exploration and exploitation. Many state-of-the-art deep reinforcement learning methods use noise in the action selection, such as Gaussian noise in policy gradient methods or $\epsilon$-greedy in Q-learning. While these methods are appealing due to their simplicity, they do not explore the state space in a methodical manner. We present an approach that uses a model to derive reward bonuses as a means of intrinsic motivation to improve model-free reinforcement learning. A key insight of our approach is that this dynamics model can be learned in the latent feature space of a value function, representing the dynamics of the agent and the environment. This method is both theoretically grounded and computationally advantageous, permitting the efficient use of Bayesian information-theoretic methods in high-dimensional state spaces. We evaluate our method on several continuous control tasks, focusing on improving exploration.


Gaussian Process Subset Scanning for Anomalous Pattern Detection in Non-iid Data

arXiv.org Machine Learning

Identifying anomalous patterns in real-world data is essential for understanding where, when, and how systems deviate from their expected dynamics. Yet methods that separately consider the anomalousness of each individual data point have low detection power for subtle, emerging irregularities. Additionally, recent detection techniques based on subset scanning make strong independence assumptions and suffer degraded performance in correlated data. We introduce methods for identifying anomalous patterns in non-iid data by combining Gaussian processes with novel log-likelihood ratio statistic and subset scanning techniques. Our approaches are powerful, interpretable, and can integrate information across multiple data streams. We illustrate their performance on numeric simulations and three open source spatiotemporal datasets of opioid overdose deaths, 311 calls, and storm reports.