Goto

Collaborating Authors

 Bayesian Learning


Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra

Neural Information Processing Systems

The most widely used technology to identify the proteins present in a complex biological sample is tandem mass spectrometry, which quickly produces a large collection of spectra representative of the peptides (i.e., protein subsequences) present in the original sample. In this work, we greatly expand the parameter learning capabilities of a dynamic Bayesian network (DBN) peptide-scoring algorithm, Didea, by deriving emission distributions for which its conditional log-likelihood scoring function remains concave. We show that this class of emission distributions, called Convex Virtual Emissions (CVEs), naturally generalizes the log-sum-exp function while rendering both maximum likelihood estimation and conditional maximum likelihood estimation concave for a wide range of Bayesian networks. Utilizing CVEs in Didea allows efficient learning of a large number of parameters while ensuring global convergence, in stark contrast to Didea's previous parameter learning framework (which could only learn a single parameter using a costly grid search) and other trainable models (which only ensure convergence to local optima). The newly trained scoring function substantially outperforms the state-of-the-art in both scoring function accuracy and downstream Fisher kernel analysis.


Reviews: Differentially private Bayesian learning on distributed data

Neural Information Processing Systems

Title: Differentially private Bayesian learning on distributed data Comments: - This paper develops a method for differential privacy (DP) Bayesian learning in a distributed setting, where data is split up over multiple clients. This differs from the traditional DP Bayesian learning setting, in which a single party has access to the full dataset. The main issue here is that performing DP methods separately on each client would yield too much noise; the goal is then to find a way to add an appropriate amount of noise, without compromising privacy, in this setting. To solve this, the authors introduce a method that combines existing DP Bayesian learning methods with a secure multi-party communication method called the DCA algorithm. Theoretically, this paper shows that the method satisfies differential privacy.


Reviews: Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning

Neural Information Processing Systems

This paper presents a sampling method that combines Hamiltonian Monte Carlo (HMC), mini-batches, tempering, and thermostats, to more efficiently explore multimodal target distributions. It is demonstrated on a number of substantial neural network problems using real data sets. This is an interesting method, and the empirical results are quite substantial. Figure 2 does a nice job of demonstrating how the omission of any of the ingredients (e.g. the tempering, or the thermostat) is detrimental to the overall result, which is a nice illustration of how the combination works together well. This is followed by some substantial image classification examples.


Reviews: Dirichlet belief networks for topic structure learning

Neural Information Processing Systems

This submission proposes a new prior on the topic-word distribution in latent topic models. This model defines a multi-layer feedforward graph, where each layer contains a set of valid multinomial distributions over the vocabulary, and weighted combinations of each layer's "topics" are used as the Dirichlet prior for the "topics" of the next layer. The key purported benefits are sharing of statistical strengh, inference of a hierarchy of interpretable "abstract" topics, and modularity that allows composition with other topic model variants that modify the document-topic distributions. The authors present an efficient fully collapsed Gibbs sampler inference scheme - I did not thoroughly check the derivation but it seems plausible. Although: what is the computational complexity (and relative "wall clock" cost) of the given inference scheme?


Reviews: Policy Gradient With Value Function Approximation For Collective Multiagent Planning

Neural Information Processing Systems

The paper presents a policy gradient algorithm for a multiagent cooperative problem, modeled in a formalism (CDEC-POMDP) whose dynamics, like congestion games, depend on groups of agents rather than individuals. This paper follows the theme of several similar advances in theis field of complex multiagent planning, using factored models to propose practical/tractable approximations. The novelty here is the use of parameterized policies and training algorithms inspired by reinforcement learning (policy gradients). The work is well-motivated, relevant, and particularly well-presented. The theoretical results are new and important.


Reviews: Bayesian Model-Agnostic Meta-Learning

Neural Information Processing Systems

Summary: Meta-learning is motivated by the promise of being able to transfer knowledge from previous learning experiences to new task settings, such that a new task can be learned more effectively from few observations. Yet, updating highly parametric models with little amounts of data can easily lead to overfitting. A promising avenue towards overcoming this challenge is a Bayesian treatment of meta-learning. This work, builds on top of recent work that provides a Bayesian interpretation of MAML (model-agnostic-meta-learning). This contribution is a direct extension of (Grant et al 2018) - where the task-train posterior was approximated via a Gaussian distribution. Applying SVGD instead allows for a more flexible and (potentially) more accurate approximation of a highly complex posterior.


Reviews: Deep Generative Markov State Models

Neural Information Processing Systems

This paper proposes a novel learning frame-work for Markov State Models of real valued vectors. This model can handle metastable processes i.e. processes that evolve locally in short time-scales but switch between a few clusters after very long periods. The proposed framework is based on a nice idea to decompose the transition from x1 to x2 to the probability that x1 belongs to a long-lived state and a distribution of x2 given the state. The first conditional probability is modeled using a decoding deep network whereas the second one can be represented either using a network that assigns weights to x2 or using a generative neural network. This is a very interesting manuscript.


Reviews: Multiscale Semi-Markov Dynamics for Intracortical Brain-Computer Interfaces

Neural Information Processing Systems

The paper describes a novel brain-computer-interface algorithm for controlling movement of a cursor to random locations on a screen using neuronal activity (power in the "spike-spectrum" of intra-cortically implanted selected electrodes). The algorithm uses a dynamic Bayesian network model that encodes possible target location (from a set of possible positions on a 40x40 grid, layed out on the screed). Target changes can only occur once a countdown timer reaches zero (time intervals are drawn at random) at which time the target has a chance of switching location. Observations (power in spike spectrum) are assumed to be drawn from a multi modal distribution (mixture of von Mises functions) as multiple neurons may affect the power recording on a single electrode and are dependent on the current movement direction. The position is simply the integration over time of the movement direction variable (with a bit of decay).


Reviews: Learning Identifiable Gaussian Bayesian Networks in Polynomial Time and Sample Complexity

Neural Information Processing Systems

In particular, it establishes that as long as noises are homoscedastic, then under a milder minimality/faithfulness assumptions it is possible to efficiently recover the GBN. Clarity The paper is heavy on notation, but everything is explained and organized clearly.


Reviews: Cluster Variational Approximations for Structure Learning of Continuous-Time Bayesian Networks from Incomplete Data

Neural Information Processing Systems

The paper introduces a generalization of previous variational methods for inference with jumps processes; here, the proposal approximating measure to the posterior relies on a star approximation. In application to continuous-time Bayesian networks, this means isolating clusters of nodes across children and parents, in order to build an efficient approximation to the traditional variational lower bound. The paper further presents examples and experiments that show how the proposed approach can be adapted to structure learning tasks in continuous-time settings. This is an interesting and topical contribution likely to appeal to the statistical and probabilistic community within NIPS. The paper is, in overall, well-written and reasonably well-structured. It offers a good background on previous work, helps the reader to understand its relevance and put its results in context within the existing literature.