Bayesian Inference
Deep Kernel Transfer in Gaussian Processes for Few-shot Learning
Patacchiola, Massimiliano, Turner, Jack, Crowley, Elliot J., Storkey, Amos
Here, we use the nomenclature derived from the meta-learning literature which is the most prevalent at time of writing. Let S {( x l,y l)} L l 1 be a support-set containing input-output pairs, with L equal to one (1-shot) or five (5-shot), and Q { (x m,y m)} M m 1be a query-set (sometimes referred to in the literature as a target-set), with M typically one order of magnitude greater than L. For ease of notation, the support and query sets are grouped in a task T {S, Q}, with the dataset D {T n} N n 1 defined as a collection of such tasks. Models are trained on random tasks sampled from D . Then, given a new task T {S, Q } sampled from a test set, the objective is to condition the model on the samples of the support S to estimate the membership of the samples in the query set Q . In the most common scenario, the inputs x D belong to the same distribution p(x) and are distributed across training, validation, and test sets such that their class membership is non-overlapping. Note that y can be a continuous value (regression) or a discrete one (classification), even though most of the previous work has focused on classification. We also consider the cross-domain scenario, where the inputs are sampled from different distributions at training and test time; this is more representative of real-world scenarios.
ABCDP: Approximate Bayesian Computation Meets Differential Privacy
Park, Mijung, Jitkrittum, Wittawat
We develop a novel approximate Bayesian computation (ABC) framework, ABCDP, that obeys the notion of differential privacy (DP). Under our framework, simply performing ABC inference with a mild modification yields differentially private posterior samples. We theoretically analyze the interplay between the ABC similarity threshold $\epsilon_{abc}$ (for comparing the similarity between real and simulated data) and the resulting privacy level $\epsilon_{dp}$ of the posterior samples, in two types of frequently-used ABC algorithms. We apply ABCDP to simulated data as well as privacy-sensitive real data. The results suggest that tuning the similarity threshold $\epsilon_{abc}$ helps us obtain better privacy and accuracy trade-off.
A Nonparametric Bayesian Model for Sparse Temporal Multigraphs
Ghalebi, Elahe, Mahyar, Hamidreza, Grosu, Radu, Taylor, Graham W., Williamson, Sinead A.
As the availability and importance of temporal interaction data--such as email communication--increases, it becomes increasingly important to understand the underlying structure that underpins these interactions. Often these interactions form a multigraph, where we might have multiple interactions between two entities. Such multigraphs tend to be sparse yet structured, and their distribution often evolves over time. Existing statistical models with interpretable parameters can capture some, but not all, of these properties. We propose a dynamic nonparametric model for interaction multigraphs that combines the sparsity of edge-exchangeable multigraphs with dynamic clustering patterns that tend to reinforce recent behavioral patterns. We show that our method yields improved held-out likelihood over stationary variants, and impressive predictive performance against a range of state-of-the-art dynamic graph models.
Customizing Sequence Generation with Multi-Task Dynamical Systems
Bird, Alex, Williams, Christopher K. I.
Dynamical system models (including RNNs) often lack the ability to adapt the sequence generation or prediction to a given context, limiting their real-world application. In this paper we show that hierarchical multi-task dynamical systems (MTDSs) provide direct user control over sequence generation, via use of a latent code $\mathbf{z}$ that specifies the customization to the individual data sequence. This enables style transfer, interpolation and morphing within generated sequences. We show the MTDS can improve predictions via latent code interpolation, and avoid the long-term performance degradation of standard RNN approaches.
Distributed Bayesian Computation for Model Choice
Buchholz, Alexander, Ahfock, Daniel, Richardson, Sylvia
We propose a general method for distributed Bayesian model choice, where each worker has access only to non-overlapping subsets of the data. Our approach approximates the model evidence for the full data set through Monte Carlo sampling from the posterior on every subset generating a model evidence per subset. The model evidences per worker are then consistently combined using a novel approach which corrects for the splitting using summary statistics of the generated samples. This divide-and-conquer approach allows Bayesian model choice in the large data setting, exploiting all available information but limiting communication between workers. Our work thereby complements the work on consensus Monte Carlo (Scott et al., 2016) by explicitly enabling model choice. In addition, we show how the suggested approach can be extended to model choice within a reversible jump setting that explores multiple models within one run.
Deep Structured Mixtures of Gaussian Processes
Trapp, Martin, Peharz, Robert, Pernkopf, Franz, Rasmussen, Carl E.
Gaussian Processes (GPs) are powerful non-parametric Bayesian regression models that allow exact posterior inference, but exhibit high computational and memory costs. In order to improve scalability of GPs, approximate posterior inference is frequently employed, where a prominent class of approximation techniques is based on local GP experts. However, the local-expert techniques proposed so far are either not well-principled, come with limited approximation guarantees, or lead to intractable models. In this paper, we introduce deep structured mixtures of GP experts, a stochastic process model which i) allows exact posterior inference, ii) has attractive computational and memory costs, and iii), when used as GP approximation, captures predictive uncertainties consistently better than previous approximations. In a variety of experiments, we show that deep structured mixtures have a low approximation error and outperform existing expert-based approaches.
Learning beyond Predefined Label Space via Bayesian Nonparametric Topic Modelling
Du, Changying, Zhuang, Fuzhen, He, Jia, He, Qing, Long, Guoping
In real world machine learning applications, testing data may contain some meaningful new categories that have not been seen in labeled training data. To simultaneously recognize new data categories and assign most appropriate category labels to the data actually from known categories, existing models assume the number of unknown new categories is pre-specified, though it is difficult to determine in advance. In this paper, we propose a Bayesian nonparametric topic model to automatically infer this number, based on the hierarchical Dirichlet process and the notion of latent Dirichlet allocation. Exact inference in our model is intractable, so we provide an efficient collapsed Gibbs sampling algorithm for approximate posterior inference. Extensive experiments on various text data sets show that: (a) compared with parametric approaches that use pre-specified true number of new categories, the proposed nonparametric approach can yield comparable performance; and (b) when the exact number of new categories is unavailable, i.e. the parametric approaches only have a rough idea about the new categories, our approach has evident performance advantages.
Learning from Indirect Observations
Zhang, Yivan, Charoenphakdee, Nontawat, Sugiyama, Masashi
Weakly-supervised learning is a paradigm for alleviating the scarcity of labeled data by leveraging lower-quality but larger-scale supervision signals. While existing work mainly focuses on utilizing a certain type of weak supervision, we present a probabilistic framework, learning from indirect observations, for learning from a wide range of weak supervision in real-world problems, e.g., noisy labels, complementary labels and coarse-grained labels. We propose a general method based on the maximum likelihood principle, which has desirable theoretical properties and can be straightforwardly implemented for deep neural networks. Concretely, a discriminative model for the true target is used for modeling the indirect observation, which is a random variable entirely depending on the true target stochastically or deterministically. Then, maximizing the likelihood given indirect observations leads to an estimator of the true target implicitly. Comprehensive experiments for two novel problem settings --- learning from multiclass label proportions and learning from coarse-grained labels, illustrate practical usefulness of our method and demonstrate how to integrate various sources of weak supervision.
Causality and deceit: Do androids watch action movies?
Pavlovic, Dusko, Pavlovic, Temra
We seek causes through science, religion, and in everyday life. We get excited when a big rock causes a big splash, and we get scared when it tumbles without a cause. But our causal cognition is usually biased. The 'why' is influenced by the 'who'. It is influenced by the 'self', and by 'others'. We share rituals, we watch action movies, and we influence each other to believe in the same causes. Human mind is packed with subjectivity because shared cognitive biases bring us together. But they also make us vulnerable. An artificial mind is deemed to be more objective than the human mind. After many years of science-fiction fantasies about even-minded androids, they are now sold as personal or expert assistants, as brand advocates, as policy or candidate supporters, as network influencers. Artificial agents have been stunningly successful in disseminating artificial causal beliefs among humans. As malicious artificial agents continue to manipulate human cognitive biases, and deceive human communities into ostensive but expansive causal illusions, the hope for defending us has been vested into developing benevolent artificial agents, tasked with preventing and mitigating cognitive distortions inflicted upon us by their malicious cousins. Can the distortions of human causal cognition be corrected on a more solid foundation of artificial causal cognition? In the present paper, we study a simple model of causal cognition, viewed as a quest for causal models. We show that, under very mild and hard to avoid assumptions, there are always self-confirming causal models, which perpetrate self-deception, and seem to preclude a royal road to objectivity.
Thomas Bayes - Wikipedia
Thomas Bayes (/beɪz/; c. 1701 – 7 April 1761)[2][3][note 1] was an English statistician, philosopher and Presbyterian minister who is known for formulating a specific case of the theorem that bears his name: Bayes' theorem. Bayes never published what would become his most famous accomplishment; his notes were edited and published after his death by Richard Price.[4] Thomas Bayes was the son of London Presbyterian minister Joshua Bayes,[5] and was possibly born in Hertfordshire.[6] He came from a prominent nonconformist family from Sheffield. In 1719, he enrolled at the University of Edinburgh to study logic and theology. On his return around 1722, he assisted his father at the latter's chapel in London before moving to Tunbridge Wells, Kent, around 1734.