AITopics | Dustin Tran

We describe Bayesian Layers, a module designed for fast experimentation with neural network uncertainty. It extends neural network libraries with drop-in replacements for common layers. This enables composition via a unified abstraction over deterministic and stochastic functions and allows for scalability via the underlying system. These layers capture uncertainty over weights (Bayesian neural nets), pre-activation units (dropout), activations ("stochastic output layers"), or the function itself (Gaussian processes). They can also be reversible to propagate uncertainty from input to output. We include code examples for common architectures such as Bayesian LSTMs, deep GPs, and flow-based models. As demonstration, we fit a 5-billion parameter "Bayesian Transformer" on 512 TPUv2 cores for uncertainty in machine translation and a Bayesian dynamics model for model-based planning. Finally, we show how Bayesian Layers can be used within the Edward2 language for probabilistic programming with stochastic processes.

artificial intelligence, bayesian layer, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Discrete Flows: Invertible Generative Models of Discrete Data

Dustin Tran, Keyon Vafa, Kumar Agrawal, Laurent Dinh, Ben Poole

Neural Information Processing SystemsJan-27-2025, 14:49:06 GMT

While normalizing flows have led to significant advances in modeling highdimensional continuous distributions, their applicability to discrete distributions remains unknown. In this paper, we show that flows can in fact be extended to discrete events--and under a simple change-of-variables formula not requiring logdeterminant-Jacobian computations. Discrete flows have numerous applications. We consider two flow architectures: discrete autoregressive flows that enable bidirectionality, allowing, for example, tokens in text to depend on both left-to-right and right-to-left contexts in an exact language model; and discrete bipartite flows that enable efficient non-autoregressive generation as in RealNVP. Empirically, we find that discrete autoregressive flows outperform autoregressive baselines on synthetic discrete distributions, an addition task, and Potts models; and bipartite flows can obtain competitive performance with autoregressive baselines on characterlevel language modeling for Penn Tree Bank and text8.

arxiv preprint arxiv, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)

Add feedback

Bayesian Layers: A Module for Neural Network Uncertainty

Dustin Tran, Mike Dusenberry, Mark van der Wilk, Danijar Hafner

Neural Information Processing SystemsJan-21-2025, 21:25:57 GMT

We describe Bayesian Layers, a module designed for fast experimentation with neural network uncertainty. It extends neural network libraries with drop-in replacements for common layers. This enables composition via a unified abstraction over deterministic and stochastic functions and allows for scalability via the underlying system. These layers capture uncertainty over weights (Bayesian neural nets), pre-activation units (dropout), activations ("stochastic output layers"), or the function itself (Gaussian processes). They can also be reversible to propagate uncertainty from input to output. We include code examples for common architectures such as Bayesian LSTMs, deep GPs, and flow-based models. As demonstration, we fit a 5-billion parameter "Bayesian Transformer" on 512 TPUv2 cores for uncertainty in machine translation and a Bayesian dynamics model for model-based planning. Finally, we show how Bayesian Layers can be used within the Edward2 language for probabilistic programming with stochastic processes.

artificial intelligence, bayesian layer, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hierarchical Implicit Models and Likelihood-Free Variational Inference

Dustin Tran, Rajesh Ranganath, David Blei

Neural Information Processing SystemsOct-8-2024, 00:21:18 GMT

Implicit probabilistic models are a flexible class of models defined by a simulation process for data. They form the basis for theories which encompass our understanding of the physical world. Despite this fundamental nature, the use of implicit models remains limited due to challenges in specifying complex latent structure in them, and in performing inferences in such models with large data sets.

artificial intelligence, bayesian inference, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Mesh-TensorFlow: Deep Learning for Supercomputers

Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman

Neural Information Processing SystemsOct-7-2024, 08:33:37 GMT

Batch-splitting (data-parallelism) is the dominant distributed Deep Neural Network (DNN) training strategy, due to its universal applicability and its amenability to Single-Program-Multiple-Data (SPMD) programming. However, batch-splitting suffers from problems including the inability to train very large models (due to memory constraints), high latency, and inefficiency at small batch sizes. All of these can be solved by more general distribution strategies (model-parallelism). Unfortunately, efficient model-parallel algorithms tend to be complicated to discover, describe, and to implement, particularly on large clusters. We introduce Mesh-TensorFlow, a language for specifying a general class of distributed tensor computations.

artificial intelligence, dimension, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Simple, Distributed, and Accelerated Probabilistic Programming

Dustin Tran, Matthew W. Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, Alexey Radul

Neural Information Processing SystemsOct-7-2024, 06:23:15 GMT

We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem.

artificial intelligence, international conference, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

Hierarchical Implicit Models and Likelihood-Free Variational Inference

Dustin Tran, Rajesh Ranganath, David Blei

Neural Information Processing SystemsOct-3-2024, 15:02:32 GMT

Implicit probabilistic models are a flexible class of models defined by a simulation process for data. They form the basis for theories which encompass our understanding of the physical world. Despite this fundamental nature, the use of implicit models remains limited due to challenges in specifying complex latent structure in them, and in performing inferences in such models with large data sets.

artificial intelligence, bayesian inference, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Filters

Collaborating Authors

Dustin Tran

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Discrete Flows: Invertible Generative Models of Discrete Data

Mesh-TensorFlow: Deep Learning for Supercomputers

Simple, Distributed, and Accelerated Probabilistic Programming

Bayesian Layers: A Module for Neural Network Uncertainty

Discrete Flows: Invertible Generative Models of Discrete Data

Bayesian Layers: A Module for Neural Network Uncertainty

Hierarchical Implicit Models and Likelihood-Free Variational Inference

Mesh-TensorFlow: Deep Learning for Supercomputers

Simple, Distributed, and Accelerated Probabilistic Programming

Hierarchical Implicit Models and Likelihood-Free Variational Inference