AITopics | Holynski, Aleksander

Collaborating Authors

Holynski, Aleksander

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SimVS: Simulating World Inconsistencies for Robust View Synthesis

Trevithick, Alex, Paiss, Roni, Henzler, Philipp, Verbin, Dor, Wu, Rundi, Alzayer, Hadi, Gao, Ruiqi, Poole, Ben, Barron, Jonathan T., Holynski, Aleksander, Ramamoorthi, Ravi, Srinivasan, Pratul P.

arXiv.org Artificial IntelligenceDec-10-2024

Novel-view synthesis techniques achieve impressive results for static scenes but struggle when faced with the inconsistencies inherent to casual capture settings: varying illumination, scene motion, and other unintended effects that are difficult to model explicitly. We present an approach for leveraging generative video models to simulate the inconsistencies in the world that can occur during capture. We use this process, along with existing multi-view datasets, to create synthetic data for training a multi-view harmonization network that is able to reconcile inconsistent observations into a consistent 3D scene. We demonstrate that our world-simulation strategy significantly outperforms traditional augmentation methods in handling real-world scene variations, thereby enabling highly accurate static 3D reconstructions in the presence of a variety of challenging inconsistencies. Project page: https://alextrevithick.github.io/simvs

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.07696

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Rethinking Score Distillation as a Bridge Between Image Distributions

McAllister, David, Ge, Songwei, Huang, Jia-Bin, Jacobs, David W., Efros, Alexei A., Holynski, Aleksander, Kanazawa, Angjoo

arXiv.org Artificial IntelligenceJun-13-2024

Score distillation sampling (SDS) has proven to be an important tool, enabling the use of large-scale diffusion priors for tasks operating in data-poor domains. Unfortunately, SDS has a number of characteristic artifacts that limit its usefulness in general-purpose applications. In this paper, we make progress toward understanding the behavior of SDS and its variants by viewing them as solving an optimal-cost transport path from a source distribution to a target distribution. Under this new interpretation, these methods seek to transport corrupted images (source) to the natural image distribution (target). We argue that current methods' characteristic artifacts are caused by (1) linear approximation of the optimal path and (2) poor estimates of the source distribution. We show that calibrating the text conditioning of the source distribution can produce high-quality generation and translation results with little extra overhead. Our method can be easily applied across many domains, matching or beating the performance of specialized methods. We demonstrate its utility in text-to-2D, text-based NeRF optimization, translating paintings to real images, optical illusion generation, and 3D sketch-to-real. We compare our method to existing approaches for score distillation sampling and show that it can produce high-frequency details with realistic colors.

diffusion model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.09417

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.34)

Add feedback

Disentangled 3D Scene Generation with Layout Learning

Epstein, Dave, Poole, Ben, Mildenhall, Ben, Efros, Alexei A., Holynski, Aleksander

arXiv.org Artificial IntelligenceFeb-26-2024

We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs from scratch - each representing its own object - along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation. For results and an interactive demo, see our project page at https://dave.ml/layoutlearning/

artificial intelligence, bitspercomponent 8, scene generation, (7 more...)

arXiv.org Artificial Intelligence

2402.16936

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.33)

Add feedback

Generative Powers of Ten

Wang, Xiaojuan, Kontkanen, Janne, Curless, Brian, Seitz, Steve, Kemelmacher, Ira, Mildenhall, Ben, Srinivasan, Pratul, Verbin, Dor, Holynski, Aleksander

arXiv.org Artificial IntelligenceDec-4-2023

We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different scales while preserving the integrity of each individual sampling process. Since each generated scale is guided by a different text prompt, our method enables deeper levels of zoom than traditional super-resolution methods that may struggle to create new contextual structure at vastly different scales. We compare our method qualitatively with alternative techniques in image super-resolution and outpainting, and show that our method is most effective at generating consistent multi-scale content.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.02149

Country: North America > United States (0.16)

Genre: Research Report (0.64)

Industry:

Media > Film (0.68)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

Nerfbusters: Removing Ghostly Artifacts from Casually Captured NeRFs

Warburg, Frederik, Weber, Ethan, Tancik, Matthew, Holynski, Aleksander, Kanazawa, Angjoo

arXiv.org Artificial IntelligenceOct-17-2023

Casually captured Neural Radiance Fields (NeRFs) suffer from artifacts such as floaters or flawed geometry when rendered outside the camera trajectory. Existing evaluation protocols often do not capture these effects, since they usually only assess image quality at every 8th frame of the training capture. To push forward progress in novel-view synthesis, we propose a new dataset and evaluation procedure, where two camera trajectories are recorded of the scene: one used for training, and the other for evaluation. In this more challenging in-the-wild setting, we find that existing hand-crafted regularizers do not remove floaters nor improve scene geometry. Thus, we propose a 3D diffusion-based method that leverages local 3D priors and a novel density-based score distillation sampling loss to discourage artifacts during NeRF optimization. We show that this data-driven prior removes floaters and improves scene geometry for casual captures.

artificial intelligence, captured nerf, removing ghostly artifact, (1 more...)

arXiv.org Artificial Intelligence

2304.10532

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Vision (0.53)

Add feedback

State of the Art on Diffusion Models for Visual Computing

Po, Ryan, Yifan, Wang, Golyanik, Vladislav, Aberman, Kfir, Barron, Jonathan T., Bermano, Amit H., Chan, Eric Ryan, Dekel, Tali, Holynski, Aleksander, Kanazawa, Angjoo, Liu, C. Karen, Liu, Lingjie, Mildenhall, Ben, Nießner, Matthias, Ommer, Björn, Theobalt, Christian, Wonka, Peter, Wetzstein, Gordon

arXiv.org Artificial IntelligenceOct-11-2023

The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applications has seen exponential growth and relevant papers are published across the computer graphics, computer vision, and AI communities with new works appearing daily on arXiv. This rapid growth of the field makes it difficult to keep up with all recent developments. The goal of this state-of-the-art report (STAR) is to introduce the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model, as well as overview important aspects of these generative AI tools, including personalization, conditioning, inversion, among others. Moreover, we give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing, categorized by the type of generated medium, including 2D images, videos, 3D objects, locomotion, and 4D scenes. Finally, we discuss available datasets, metrics, open challenges, and social implications. This STAR provides an intuitive starting point to explore this exciting topic for researchers, artists, and practitioners alike.

artificial intelligence, machine learning, natural language, (4 more...)

arXiv.org Artificial Intelligence

2310.07204

Genre:

Overview (0.53)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.73)

Add feedback

Diffusion Self-Guidance for Controllable Image Generation

Epstein, Dave, Jabri, Allan, Poole, Ben, Efros, Alexei A., Holynski, Aleksander

arXiv.org Artificial IntelligenceJun-11-2023

Large-scale generative models are capable of producing high-quality images from detailed text descriptions. However, many aspects of an image are difficult or impossible to convey through text. We introduce self-guidance, a method that provides greater control over generated images by guiding the internal representations of diffusion models. We demonstrate that properties such as the shape, location, and appearance of objects can be extracted from these representations and used to steer sampling. Self-guidance works similarly to classifier guidance, but uses signals present in the pretrained model itself, requiring no additional models or training. We show how a simple set of properties can be composed to perform challenging image manipulations, such as modifying the position or size of objects, merging the appearance of objects in one image with the layout of another, composing objects from many images into one, and more. We also show that self-guidance can be used to edit real images. For results and an interactive demo, see our project page at https://dave.ml/selfguidance/

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.00986

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.82)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)

Add feedback

InstructPix2Pix: Learning to Follow Image Editing Instructions

Brooks, Tim, Holynski, Aleksander, Efros, Alexei A.

arXiv.org Artificial IntelligenceJan-18-2023

We propose a method for editing images from human instructions: given an input image and a written instruction that tells the model what to do, our model follows these instructions to edit the image. To obtain training data for this problem, we combine the knowledge of two large pretrained models -- a language model (GPT-3) and a text-to-image model (Stable Diffusion) -- to generate a large dataset of image editing examples. Our conditional diffusion model, InstructPix2Pix, is trained on our generated data, and generalizes to real images and user-written instructions at inference time. Since it performs edits in the forward pass and does not require per example fine-tuning or inversion, our model edits images quickly, in a matter of seconds. We show compelling editing results for a diverse collection of input images and written instructions.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.098

Country: Europe (0.46)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.93)
Media > Photography (0.62)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback