AITopics | Poole, Ben

Collaborating Authors

Poole, Ben

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SimVS: Simulating World Inconsistencies for Robust View Synthesis

Trevithick, Alex, Paiss, Roni, Henzler, Philipp, Verbin, Dor, Wu, Rundi, Alzayer, Hadi, Gao, Ruiqi, Poole, Ben, Barron, Jonathan T., Holynski, Aleksander, Ramamoorthi, Ravi, Srinivasan, Pratul P.

arXiv.org Artificial IntelligenceDec-10-2024

Novel-view synthesis techniques achieve impressive results for static scenes but struggle when faced with the inconsistencies inherent to casual capture settings: varying illumination, scene motion, and other unintended effects that are difficult to model explicitly. We present an approach for leveraging generative video models to simulate the inconsistencies in the world that can occur during capture. We use this process, along with existing multi-view datasets, to create synthetic data for training a multi-view harmonization network that is able to reconcile inconsistent observations into a consistent 3D scene. We demonstrate that our world-simulation strategy significantly outperforms traditional augmentation methods in handling real-world scene variations, thereby enabling highly accurate static 3D reconstructions in the presence of a variety of challenging inconsistencies. Project page: https://alextrevithick.github.io/simvs

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.07696

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

EM Distillation for One-step Diffusion Models

Xie, Sirui, Xiao, Zhisheng, Kingma, Diederik P, Hou, Tingbo, Wu, Ying Nian, Murphy, Kevin Patrick, Salimans, Tim, Poole, Ben, Gao, Ruiqi

arXiv.org Machine LearningMay-27-2024

While diffusion models can learn complex distributions, sampling requires a computationally expensive iterative process. Existing distillation methods enable efficient sampling, but have notable limitations, such as performance degradation with very few sampling steps, reliance on training data access, or mode-seeking optimization that may fail to capture the full distribution. We propose EM Distillation (EMD), a maximum likelihood-based approach that distills a diffusion model to a one-step generator model with minimal loss of perceptual quality. Our approach is derived through the lens of Expectation-Maximization (EM), where the generator parameters are updated using samples from the joint distribution of the diffusion teacher prior and inferred generator latents. We develop a reparametrized sampling scheme and a noise cancellation technique that together stabilizes the distillation process. We further reveal an interesting connection of our method with existing methods that minimize mode-seeking KL. EMD outperforms existing one-step generative methods in terms of FID scores on ImageNet-64 and ImageNet-128, and compares favorably with prior work on distilling text-to-image diffusion models.

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

arXiv.org Machine Learning

2405.16852

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Disentangled 3D Scene Generation with Layout Learning

Epstein, Dave, Poole, Ben, Mildenhall, Ben, Efros, Alexei A., Holynski, Aleksander

arXiv.org Artificial IntelligenceFeb-26-2024

We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs from scratch - each representing its own object - along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation. For results and an interactive demo, see our project page at https://dave.ml/layoutlearning/

artificial intelligence, bitspercomponent 8, scene generation, (7 more...)

arXiv.org Artificial Intelligence

2402.16936

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.33)

Add feedback

Variational Prediction

Alemi, Alexander A., Poole, Ben

arXiv.org Artificial IntelligenceJul-14-2023

Bayesian inference offers benefits over maximum likelihood, but it also comes with computational costs. Computing the posterior is typically intractable, as is marginalizing that posterior to form the posterior predictive distribution. In this paper, we present variational prediction, a technique for directly learning a variational approximation to the posterior predictive distribution using a variational bound. This approach can provide good predictive distributions without test time marginalization costs. We demonstrate Variational Prediction on an illustrative toy example.

artificial intelligence, machine learning, posterior, (14 more...)

arXiv.org Artificial Intelligence

2307.07568

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Diffusion Self-Guidance for Controllable Image Generation

Epstein, Dave, Jabri, Allan, Poole, Ben, Efros, Alexei A., Holynski, Aleksander

arXiv.org Artificial IntelligenceJun-11-2023

Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that provides greater control over generated images by guiding the internal representations of diffusion models. We demonstrate that properties such as the shape, location, and appearance of objects can be extracted from these representations and used to steer sampling. Self-guidance works similarly to classifier guidance, but uses signals present in the pretrained model itself, requiring no additional models or training. We show how a simple set of properties can be composed to perform challenging image manipulations, such as modifying the position or size of objects, merging the appearance of objects in one image with the layout of another, composing objects from many images into one, and more. We also show that self-guidance can be used to edit real images. For results and an interactive demo, see our project page at https://dave.ml/selfguidance/

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.00986

Country: Asia (0.28)

Genre: Research Report (0.82)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)

Add feedback

Learning a Diffusion Prior for NeRFs

Yang, Guandao, Kundu, Abhijit, Guibas, Leonidas J., Barron, Jonathan T., Poole, Ben

arXiv.org Artificial IntelligenceApr-27-2023

Neural Radiance Fields (NeRFs) have emerged as a powerful neural 3D representation for objects and scenes derived from 2D data. Generating NeRFs, however, remains difficult in many scenarios. For instance, training a NeRF with only a small number of views as supervision remains challenging since it is an under-constrained problem. In such settings, it calls for some inductive prior to filter out bad local minima. One way to introduce such inductive priors is to learn a generative model for NeRFs modeling a certain class of scenes. In this paper, we propose to use a diffusion model to generate NeRFs encoded on a regularized grid. We show that our model can sample realistic NeRFs, while at the same time allowing conditional generations, given a certain observation as guidance.

artificial intelligence, diffusion model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2304.14473

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

DreamBooth3D: Subject-Driven Text-to-3D Generation

Raj, Amit, Kaza, Srinivas, Poole, Ben, Niemeyer, Michael, Ruiz, Nataniel, Mildenhall, Ben, Zada, Shiran, Aberman, Kfir, Rubinstein, Michael, Barron, Jonathan, Li, Yuanzhen, Jampani, Varun

arXiv.org Artificial IntelligenceMar-27-2023

We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject. Our approach combines recent advances in personalizing text-to-image models (DreamBooth) with text-to-3D generation (DreamFusion). We find that naively combining these methods fails to yield satisfactory subject-specific 3D assets due to personalized text-to-image models overfitting to the input viewpoints of the subject. We overcome this through a 3-stage optimization strategy where we jointly leverage the 3D consistency of neural radiance fields together with the personalization capability of text-to-image models. Our method can produce high-quality, subject-specific 3D assets with text-driven modifications such as novel poses, colors and attributes that are not seen in any of the input images of the subject.

artificial intelligence, dreambooth3d, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.13508

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

VeLO: Training Versatile Learned Optimizers by Scaling Up

Metz, Luke, Harrison, James, Freeman, C. Daniel, Merchant, Amil, Beyer, Lucas, Bradbury, James, Agrawal, Naman, Poole, Ben, Mordatch, Igor, Roberts, Adam, Sohl-Dickstein, Jascha

arXiv.org Artificial IntelligenceNov-17-2022

While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized. We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at velo-code.github.io.

artificial intelligence, machine learning, optimizer, (17 more...)

arXiv.org Artificial Intelligence

2211.0976

Genre: Research Report > New Finding (0.45)

Industry:

Education (1.00)
Energy > Oil & Gas (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Zero-Shot Text-Guided Object Generation with Dream Fields

Jain, Ajay, Mildenhall, Ben, Barron, Jonathan T., Abbeel, Pieter, Poole, Ben

arXiv.org Artificial IntelligenceDec-2-2021

We combine neural rendering with multi-modal image and text representations to synthesize diverse 3D objects solely from natural language descriptions. Our method, Dream Fields, can generate the geometry and color of a wide range of objects without 3D supervision. Due to the scarcity of diverse, captioned 3D data, prior methods only generate objects from a handful of categories, such as ShapeNet. Instead, we guide generation with image-text models pre-trained on large datasets of captioned images from the web. Our method optimizes a Neural Radiance Field from many camera views so that rendered images score highly with a target caption according to a pre-trained CLIP model. To improve fidelity and visual quality, we introduce simple geometric priors, including sparsity-inducing transmittance regularization, scene bounds, and new MLP architectures. In experiments, Dream Fields produce realistic, multi-view consistent object geometry and color from a variety of natural language captions.

artificial intelligence, dream field, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2112.01455

Genre: Research Report (0.64)

Technology: