Goto

Collaborating Authors

 probabilistic interpretation


Resilient Multiple Choice Learning: A learned scoring scheme with application to audio scene analysis

Neural Information Processing Systems

We introduce Resilient Multiple Choice Learning (rMCL), an extension of the MCL approach for conditional distribution estimation in regression settings where multiple targets may be sampled for each training input. Multiple Choice Learning is a simple framework to tackle multimodal density estimation, using the WinnerTakes-All (WTA) loss for a set of hypotheses. In regression settings, the existing MCL variants focus on merging the hypotheses, thereby eventually sacrificing the diversity of the predictions. In contrast, our method relies on a novel learned scoring scheme underpinned by a mathematical framework based on Voronoi tessellations of the output space, from which we can derive a probabilistic interpretation. After empirically validating rMCL with experiments on synthetic data, we further assess its merits on the sound source localization task, demonstrating its practical usefulness and the relevance of its interpretation.



Bidirectional Recurrent Neural Networks as Generative Models

Neural Information Processing Systems

Bidirectional recurrent neural networks (RNN) are trained to predict both in the positive and negative time directions simultaneously. They have not been used commonly in unsupervised tasks, because a probabilistic interpretation of the model has been difficult. Recently, two different frameworks, GSN and NADE, provide a connection between reconstruction and probabilistic modeling, which makes the interpretation possible. As far as we know, neither GSN or NADE have been studied in the context of time series before.As an example of an unsupervised task, we study the problem of filling in gaps in high-dimensional time series with complex dynamics. Although unidirectional RNNs have recently been trained successfully to model such time series, inference in the negative time direction is non-trivial. We propose two probabilistic interpretations of bidirectional RNNs that can be used to reconstruct missing gaps efficiently. Our experiments on text data show that both proposed methods are much more accurate than unidirectional reconstructions, although a bit less accurate than a computationally complex bidirectional Bayesian inference on the unidirectional RNN. We also provide results on music data for which the Bayesian inference is computationally infeasible, demonstrating the scalability of the proposed methods.


Suitability of CCA for Generating Latent State/ Variables in Multi-View Textual Data

arXiv.org Artificial Intelligence

The probabilistic interpretation of Canonical Correlation Analysis (CCA) for learning low-dimensional real vectors, called as latent variables, has been exploited immensely in various fields. This study takes a step further by demonstrating the potential of CCA in discovering a latent state that captures the contextual information within the textual data under a two-view setting. The interpretation of CCA discussed in this study utilizes the multi-view nature of textual data, i.e. the consecutive sentences in a document or turns in a dyadic conversation, and has a strong theoretical foundation. Furthermore, this study proposes a model using CCA to perform the Automatic Short Answer Grading (ASAG) task. The empirical analysis confirms that the proposed model delivers competitive results and can even beat various sophisticated supervised techniques. The model is simple, linear, and adaptable and should be used as the baseline especially when labeled training data is scarce or nonexistent.


5ef0b4eba35ab2d6180b0bca7e46b6f9-Reviews.html

Neural Information Processing Systems

SUMMARY This paper studies the problem of low rank matrix completion which exists in many real-world applications such as collaborative filtering for recommender systems. A previous work (ref [4]) proposed a scalable algorithm called Soft-Impute for solving a convex optimization problem involving the nuclear norm as a regularizer. Like previous work such as probabilistic matrix factorization (PMF), this paper gives the problem a probabilistic interpretation by relating the (non-probabilistic) optimization problem to a MAP estimation problem. Different (concave) penalty functions of the nuclear norm are proposed and then an EM algorithm is proposed to solve the MAP estimation problem. The algorithms proposed in this paper are more general than the Soft-Impute algorithm proposed in [4] in that the latter comes as a particular case.


Probabilistic Multi-Task Feature Selection

Neural Information Processing Systems

Recently, some variants of the l_1 norm, particularly matrix norms such as the l_{1,2} and l_{1,\infty} norms, have been widely used in multi-task learning, compressed sensing and other related areas to enforce sparsity via joint regularization. In this paper, we unify the l_{1,2} and l_{1,\infty} norms by considering a family of l_{1,q} norms for 1 q\le\infty and study the problem of determining the most appropriate sparsity enforcing norm to use in the context of multi-task feature selection. Using the generalized normal distribution, we provide a probabilistic interpretation of the general multi-task feature selection problem using the l_{1,q} norm. Based on this probabilistic interpretation, we develop a probabilistic model using the noninformative Jeffreys prior. We also extend the model to learn and exploit more general types of pairwise relationships between tasks.


Resilient Multiple Choice Learning: A learned scoring scheme with application to audio scene analysis

arXiv.org Machine Learning

We introduce Resilient Multiple Choice Learning (rMCL), an extension of the MCL approach for conditional distribution estimation in regression settings where multiple targets may be sampled for each training input. Multiple Choice Learning is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses. In regression settings, the existing MCL variants focus on merging the hypotheses, thereby eventually sacrificing the diversity of the predictions. In contrast, our method relies on a novel learned scoring scheme underpinned by a mathematical framework based on Voronoi tessellations of the output space, from which we can derive a probabilistic interpretation. After empirically validating rMCL with experiments on synthetic data, we further assess its merits on the sound source localization problem, demonstrating its practical usefulness and the relevance of its interpretation.


Safe Reinforcement Learning as Wasserstein Variational Inference: Formal Methods for Interpretability

arXiv.org Artificial Intelligence

Reinforcement Learning or optimal control can provide effective reasoning for sequential decision-making problems with variable dynamics. Such reasoning in practical implementation, however, poses a persistent challenge in interpreting the reward function and corresponding optimal policy. Consequently, formalizing the sequential decision-making problems as inference has a considerable value, as probabilistic inference in principle offers diverse and powerful mathematical tools to infer the stochastic dynamics whilst suggesting a probabilistic interpretation of the reward design and policy convergence. In this study, we propose a novel Adaptive Wasserstein Variational Optimization (AWaVO) to tackle these challenges in sequential decision-making. Our approach utilizes formal methods to provide interpretations of reward design, transparency of training convergence, and probabilistic interpretation of sequential decisions. To demonstrate practicality, we show convergent training with guaranteed global convergence rates not only in simulation but also in real robot tasks, and empirically verify a reasonable tradeoff between high performance and conservative interpretability.


A Probabilistic Interpretation of SVMs with an Application to Unbalanced Classification

Neural Information Processing Systems

In this paper, we show that the hinge loss can be interpreted as the neg-log-likelihood of a semi-parametric model of posterior probabilities. From this point of view, SVMs represent the parametric component of a semi-parametric model fitted by a maximum a posteriori estimation pro- cedure. This connection enables to derive a mapping from SVM scores to estimated posterior probabilities. Unlike previous proposals, the sug- gested mapping is interval-valued, providing a set of posterior probabil- ities compatible with each SVM score. This framework offers a new way to adapt the SVM optimization problem to unbalanced classifica- tion, when decisions result in unequal (asymmetric) losses.


Improving Mutual Information Estimation with Annealed and Energy-Based Bounds

arXiv.org Artificial Intelligence

Mutual information (MI) is a fundamental quantity in information theory and machine learning. However, direct estimation of MI is intractable, even if the true joint probability density for the variables of interest is known, as it involves estimating a potentially high-dimensional log partition function. In this work, we present a unifying view of existing MI bounds from the perspective of importance sampling, and propose three novel bounds based on this approach. Since accurate estimation of MI without density information requires a sample size exponential in the true MI, we assume either a single marginal or the full joint density information is known. In settings where the full joint density is available, we propose Multi-Sample Annealed Importance Sampling (AIS) bounds on MI, which we demonstrate can tightly estimate large values of MI in our experiments. In settings where only a single marginal distribution is known, we propose Generalized IWAE (GIWAE) and MINE-AIS bounds. Our GIWAE bound unifies variational and contrastive bounds in a single framework that generalizes InfoNCE, IWAE, and Barber-Agakov bounds. Our MINE-AIS method improves upon existing energy-based methods such as MINE-DV and MINE-F by directly optimizing a tighter lower bound on MI. MINE-AIS uses MCMC sampling to estimate gradients for training and Multi-Sample AIS for evaluating the bound. Our methods are particularly suitable for evaluating MI in deep generative models, since explicit forms of the marginal or joint densities are often available. We evaluate our bounds on estimating the MI of VAEs and GANs trained on the MNIST and CIFAR datasets, and showcase significant gains over existing bounds in these challenging settings with high ground truth MI.