Goto

Collaborating Authors

 David Blei


Structured Embedding Models for Grouped Data

Neural Information Processing Systems

We study how the word usage of U.S. Congressional speeches varies across states and party affiliation, how words are used differently across sections of the ArXiv, and how the copurchase patterns of groceries can vary across seasons. Key to the success of our method is that the groups share statistical information. We develop two sharing strategies: hierarchical modeling and amortization. We demonstrate the benefits of this approach in empirical studies of speeches, abstracts, and shopping baskets.


Context Selection for Embedding Models

Neural Information Processing Systems

Word embeddings are an effective tool to analyze language. They have been recently extended to model other types of data beyond text, such as items in recommendation systems. Embedding models consider the probability of a target observation (a word or an item) conditioned on the elements in the context (other words or items). In this paper, we show that conditioning on all the elements in the context is not optimal. Instead, we model the probability of the target conditioned on a learned subset of the elements in the context. We use amortized variational inference to automatically choose this subset. Compared to standard embedding models, this method improves predictions and the quality of the embeddings.


Hierarchical Implicit Models and Likelihood-Free Variational Inference

Neural Information Processing Systems

Implicit probabilistic models are a flexible class of models defined by a simulation process for data. They form the basis for theories which encompass our understanding of the physical world. Despite this fundamental nature, the use of implicit models remains limited due to challenges in specifying complex latent structure in them, and in performing inferences in such models with large data sets.


Variational Inference via $\chi$ Upper Bound Minimization

Neural Information Processing Systems

It posits a family of approximating distributions q and finds the closest member to the exact posterior p. Closeness is usually measured via a divergence D(q||p) from q to p. While successful, this approach also has problems. Notably, it typically leads to underestimation of the posterior variance.