Goto

Collaborating Authors

 gelman


Whodunnit: The Upstate Murder-Mystery Weekend

The New Yorker

Sign up to receive it in your inbox. The event has a storied history among mystery buffs; some of its first scripts were written by the celebrated author Donald E. Westlake, along with his wife Abby, and they often collaborated with notable writer friends, including Stephen King, Edward Gorey, and Isaac Asimov, on everything from performing to graphic design. A half century ago, few, if any, hotels offered "immersive theatre" as an amenity, and the Mystery Weekend became a hot ticket for city dwellers--the first weekend, in 1977, drew more than two hundred participants. Soon, mystery-solving events were de rigueur at many rural hotels, whose owners found that staging crime scenes was a surefire way to lure cosmopolitans to the country during the off-season. In 1992, the reporter Alessandra Stanley noted that the swelling glut of mystery parties came in three categories: serious, "in which participants form teams and spend two to three days"; semi-serious, which "take place in large hotels, over meals, and are meant to be more entertaining than challenging"; and those on cruise ships, which are fully unserious.



Predictive variational inference: Learn the predictively optimal posterior distribution

arXiv.org Machine Learning

Vanilla variational inference finds an optimal approximation to the Bayesian posterior distribution, but even the exact Bayesian posterior is often not meaningful under model misspecification. We propose predictive variational inference (PVI): a general inference framework that seeks and samples from an optimal posterior density such that the resulting posterior predictive distribution is as close to the true data generating process as possible, while this this closeness is measured by multiple scoring rules. By optimizing the objective, the predictive variational inference is generally not the same as, or even attempting to approximate, the Bayesian posterior, even asymptotically. Rather, we interpret it as implicit hierarchical expansion. Further, the learned posterior uncertainty detects heterogeneity of parameters among the population, enabling automatic model diagnosis. This framework applies to both likelihood-exact and likelihood-free models. We demonstrate its application in real data examples.


Locking and Quacking: Stacking Bayesian model predictions by log-pooling and superposition

arXiv.org Artificial Intelligence

Combining predictions from different models is a central problem in Bayesian inference and machine learning more broadly. Currently, these predictive distributions are almost exclusively combined using linear mixtures such as Bayesian model averaging, Bayesian stacking, and mixture of experts. Such linear mixtures impose idiosyncrasies that might be undesirable for some applications, such as multi-modality. While there exist alternative strategies (e.g. geometric bridge or superposition), optimising their parameters usually involves computing an intractable normalising constant repeatedly. We present two novel Bayesian model combination tools. These are generalisations of model stacking, but combine posterior densities by log-linear pooling (locking) and quantum superposition (quacking). To optimise model weights while avoiding the burden of normalising constants, we investigate the Hyvarinen score of the combined posterior predictions. We demonstrate locking with an illustrative example and discuss its practical application with importance sampling.


Strengthening trust in machine-learning models

#artificialintelligence

Probabilistic machine learning methods are becoming increasingly powerful tools in data analysis, informing a range of critical decisions across disciplines and applications, from forecasting election results to predicting the impact of microloans on addressing poverty. This class of methods uses sophisticated concepts from probability theory to handle uncertainty in decision-making. But the math is only one piece of the puzzle in determining their accuracy and effectiveness. In a typical data analysis, researchers make many subjective choices, or potentially introduce human error, that must also be assessed in order to cultivate users' trust in the quality of decisions based on these methods. To address this issue, MIT computer scientist Tamara Broderick, associate professor in the Department of Electrical Engineering and Computer Science (EECS) and a member of the Laboratory for Information and Decision Systems (LIDS), and a team of researchers have developed a classification system--a "taxonomy of trust"--that defines where trust might break down in a data analysis and identifies strategies to strengthen trust at each step.


Global Big Data Conference

#artificialintelligence

A Columbia University research team affiliated with the Data Science Institute (DSI) has received a Facebook Probability and Programming research award to develop static analysis methods that will enhance the usability and accuracy of probabilistic programming. The team includes Jeannette M. Wing, DSI's Avanessians Director and Professor of Computer Science; Andrew Gelman, Professor of Statistics and Political Science and DSI member; and Ryan Bernstein, a doctoral student in computer science who is co-advised by Wing and Gelman. The three will conduct a static analysis of Stan, an open-source probabilistic language program developed mainly at Columbia that describes statistical models. Their analysis will make it easier for users to reliably design statistical and machine learning models in high-level programming languages, according to Gelman, who is a co-principal investigator on the award. "Stan is used in applications ranging from drug development [for Novartis] to political polling and forecasting [for YouGov and The Economist]," Gelman said.


Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors

arXiv.org Machine Learning

When working with multimodal Bayesian posterior distributions, Markov chain Monte Carlo (MCMC) algorithms can have difficulty moving between modes, and default variational or mode-based approximate inferences will understate posterior uncertainty. And, even if the most important modes can be found, it is difficult to evaluate their relative weights in the posterior. Here we propose an alternative approach, using parallel runs of MCMC, variational, or mode-based inference to hit as many modes or separated regions as possible, and then combining these using importance sampling based Bayesian stacking, a scalable method for constructing a weighted average of distributions so as to maximize cross-validated prediction utility. The result from stacking is not necessarily equivalent, even asymptotically, to fully Bayesian inference, but it serves many of the same goals. Under misspecified models, stacking can give better predictive performance than full Bayesian inference, hence the multimodality can be considered a blessing rather than a curse. We explore with an example where the stacked inference approximates the true data generating process from the misspecified model, an example of inconsistent inference, and non-mixing samplers. We elaborate the practical implantation in the context of latent Dirichlet allocation, Gaussian process regression, hierarchical model, variational inference in horseshoe regression, and neural networks.


For Better Science, Bring on the Revolutionaries

Slate

A leading biologist at Harvard, Pardis Sabeti, has called out the replication movement in psychology, calling it a "cautionary tale" of how efforts to reform research may "end up destroying new ideas before they are fully explored." Her argument, in short, is that the "vicious" debate over statistical errors in that field has only stymied further progress. There's "a better way forward," Sabeti says, "through evolution, not revolution." For comparison, she describes what happened in her own field of human genomics: A rash of false-positive results gave way about 10 years ago, without much fuss or incivility, to a new and better way of doing science. "We emerged more engaged, productive, successful, and united," she says.


Understanding overfitting: an inaccurate meme in Machine Learning

@machinelearnbot

This post was inspired by a recent post by Andrew Gelman, who defined'overfitting' as follows: Overfitting is when you have a complicated model that gives worse predictions, on average, than a simpler model. Preamble There is a lot of confusion among practitioners regarding the concept of overfitting. Applying cross-validation prevents overfitting and a good out-of-sample performance, low generalisation error in unseen data, indicates not an overfit. This statement is of course not true: cross-validation does not prevent your model to overfit and good out-of-sample performance does not guarantee not-overfitted model. What actually people refer to in one aspect of this statement is called overtraining.