AITopics | dreamfusion

Collaborating Authors

dreamfusion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary Material for DreamHuman: Animatable 3DAvatars from Text

Neural Information Processing SystemsApr-25-2026, 20:31:06 GMT

This document contains additional details and experiments that did not fit in the main text due to space constraints. For animations and additional results please also check the included videos. We use a similar optimization strategy with DreamFusion, so unless otherwise noted the hyperparameters remain the same. For example, we use the Distributed Shampoo optimizer [2]. Similarly with DreamFusion we also train on a TPUv4 machine with 4 chips.

artificial intelligence, geometry, shape parameter, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.70)

Add feedback

Sketch2NeRF: Multi-view Sketch-guided Text-to-3D Generation

Chen, Minglin, Yuan, Weihao, Wang, Yukun, Sheng, Zhe, He, Yisheng, Dong, Zilong, Bo, Liefeng, Guo, Yulan

arXiv.org Artificial IntelligenceJan-27-2024

Recently, text-to-3D approaches have achieved high-fidelity 3D content generation using text description. However, the generated objects are stochastic and lack fine-grained control. Sketches provide a cheap approach to introduce such fine-grained control. Nevertheless, it is challenging to achieve flexible control from these sketches due to their abstraction and ambiguity. In this paper, we present a multi-view sketch-guided text-to-3D generation framework (namely, Sketch2NeRF) to add sketch control to 3D generation. Specifically, our method leverages pretrained 2D diffusion models (e.g., Stable Diffusion and ControlNet) to supervise the optimization of a 3D scene represented by a neural radiance field (NeRF). We propose a novel synchronized generation and reconstruction method to effectively optimize the NeRF. In the experiments, we collected two kinds of multi-view sketch datasets to evaluate the proposed method. We demonstrate that our method can synthesize 3D consistent contents with fine-grained sketch control while being high-fidelity to text prompts. Extensive results show that our method achieves state-of-the-art performance in terms of sketch similarity and text alignment.

computer vision, diffusion model, sketch, (12 more...)

arXiv.org Artificial Intelligence

2401.14257

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance

Zhu, Junzhe, Zhuang, Peiye

arXiv.org Artificial IntelligenceNov-28-2023

The advancements in automatic text-to-3D generation have been remarkable. Most existing methods use pre-trained text-to-image diffusion models to optimize 3D representations like Neural Radiance Fields (NeRFs) via latent-space denoising score matching. Yet, these methods often result in artifacts and inconsistencies across different views due to their suboptimal optimization approaches and limited understanding of 3D geometry. Moreover, the inherent constraints of NeRFs in rendering crisp geometry and stable textures usually lead to a two-stage optimization to attain high-resolution details. This work proposes holistic sampling and smoothing approaches to achieve high-quality text-to-3D generation, all in a single-stage optimization. We compute denoising scores in the text-to-image diffusion model's latent and image spaces. Instead of randomly sampling timesteps (also referred to as noise levels in denoising score matching), we introduce a novel timestep annealing approach that progressively reduces the sampled timestep throughout optimization. To generate high-quality renderings in a single-stage optimization, we propose regularization for the variance of z-coordinates along NeRF rays. To address texture flickering issues in NeRFs, we introduce a kernel smoothing technique that refines importance sampling weights coarse-to-fine, ensuring accurate and thorough sampling in high-density regions. Extensive experiments demonstrate the superiority of our method over previous approaches, enabling the generation of highly detailed and view-consistent 3D assets through a single-stage training process.

diffusion model, representation, text-to-image diffusion model, (16 more...)

arXiv.org Artificial Intelligence

2305.18766

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Republic of Türkiye (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

Wang, Zhengyi, Lu, Cheng, Wang, Yikai, Bao, Fan, Li, Chongxuan, Su, Hang, Zhu, Jun

arXiv.org Artificial IntelligenceNov-22-2023

Score distillation sampling (SDS) has shown great promise in text-to-3D generation by distilling pretrained large-scale text-to-image diffusion models, but suffers from over-saturation, over-smoothing, and low-diversity problems. In this work, we propose to model the 3D parameter as a random variable instead of a constant as in SDS and present variational score distillation (VSD), a principled particle-based variational framework to explain and address the aforementioned issues in text-to-3D generation. We show that SDS is a special case of VSD and leads to poor samples with both small and large CFG weights. In comparison, VSD works well with various CFG weights as ancestral sampling from diffusion models and simultaneously improves the diversity and sample quality with a common CFG weight (i.e., $7.5$). We further present various improvements in the design space for text-to-3D such as distillation time schedule and density initialization, which are orthogonal to the distillation algorithm yet not well explored. Our overall approach, dubbed ProlificDreamer, can generate high rendering resolution (i.e., $512\times512$) and high-fidelity NeRF with rich structure and complex effects (e.g., smoke and drops). Further, initialized from NeRF, meshes fine-tuned by VSD are meticulously detailed and photo-realistic. Project page and codes: https://ml.cs.tsinghua.edu.cn/prolificdreamer/

particle, prolificdreamer, vsd, (15 more...)

arXiv.org Artificial Intelligence

2305.16213

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > United Kingdom > England (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Text-to-3D with Classifier Score Distillation

Yu, Xin, Guo, Yuan-Chen, Li, Yangguang, Liang, Ding, Zhang, Song-Hai, Qi, Xiaojuan

arXiv.org Artificial IntelligenceOct-31-2023

However, it is still challenging and expensive to create a high-quality 3D asset as it requires a high level of expertise. Therefore, automating this process with generative models has become an important problem, which remains challenging due to the scarcity of data and the complexity of 3D representations. Recently, techniques based on Score Distillation Sampling (SDS) (Poole et al., 2022; Lin et al., 2023; Chen et al., 2023; Wang et al., 2023b), also known as Score Jacobian Chaining (SJC) (Wang et al., 2023a), have emerged as a major research direction for text-to-3D generation, as they can produce high-quality and intricate 3D results from diverse text prompts without requiring 3D data for training. The core principle behind SDS is to optimize 3D representations by encouraging their rendered images to move towards high probability density regions conditioned on the text, where the supervision is provided by a pre-trained 2D diffusion model (Ho et al., 2020; Sohl-Dickstein et al., 2015; Rombach et al., 2022; Saharia et al., 2022; Balaji et al., 2022). DreamFusion (Poole et al., 2022) advocates the use of SDS for the optimization of Neural Radiance Fields (NeRF).

diffusion model, dslr photo, optimization, (13 more...)

arXiv.org Artificial Intelligence

2310.19415

Country:

Europe > United Kingdom > England (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Collaborative Score Distillation for Consistent Visual Synthesis

Kim, Subin, Lee, Kyungmin, Choi, June Suk, Jeong, Jongheon, Sohn, Kihyuk, Shin, Jinwoo

arXiv.org Artificial IntelligenceJul-4-2023

Generative priors of large-scale text-to-image diffusion models enable a wide range of new generation and editing applications on diverse visual modalities. However, when adapting these priors to complex visual modalities, often represented as multiple images (e.g., video), achieving consistency across a set of images is challenging. In this paper, we address this challenge with a novel method, Collaborative Score Distillation (CSD). CSD is based on the Stein Variational Gradient Descent (SVGD). Specifically, we propose to consider multiple samples as "particles" in the SVGD update and combine their score functions to distill generative priors over a set of images synchronously. Thus, CSD facilitates seamless integration of information across 2D images, leading to a consistent visual synthesis across multiple samples. We show the effectiveness of CSD in a variety of tasks, encompassing the visual editing of panorama images, videos, and 3D scenes. Our results underline the competency of CSD as a versatile method for enhancing inter-sample consistency, thereby broadening the applicability of text-to-image diffusion models.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2307.04787

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

ATT3D: Amortized Text-to-3D Object Synthesis

Lorraine, Jonathan, Xie, Kevin, Zeng, Xiaohui, Lin, Chen-Hsuan, Takikawa, Towaki, Sharp, Nicholas, Lin, Tsung-Yi, Liu, Ming-Yu, Fidler, Sanja, Lucas, James

arXiv.org Artificial IntelligenceJun-6-2023

Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields. DreamFusion recently achieved high-quality results but requires a lengthy, per-prompt optimization to create 3D objects. To address this, we amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead of separately. With this, we share computation across a prompt set, training in less time than per-prompt optimization. Our framework - Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to generalize to unseen setups and smooth interpolations between text for novel assets and simple animations.

artificial intelligence, machine learning, optimization, (20 more...)

arXiv.org Artificial Intelligence

2306.07349

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DreamBooth3D: Subject-Driven Text-to-3D Generation

Raj, Amit, Kaza, Srinivas, Poole, Ben, Niemeyer, Michael, Ruiz, Nataniel, Mildenhall, Ben, Zada, Shiran, Aberman, Kfir, Rubinstein, Michael, Barron, Jonathan, Li, Yuanzhen, Jampani, Varun

arXiv.org Artificial IntelligenceMar-27-2023

We present DreamBooth3D, an approach to personalize text-to-3D generative models from as few as 3-6 casually captured images of a subject. Our approach combines recent advances in personalizing text-to-image models (DreamBooth) with text-to-3D generation (DreamFusion). We find that naively combining these methods fails to yield satisfactory subject-specific 3D assets due to personalized text-to-image models overfitting to the input viewpoints of the subject. We overcome this through a 3-stage optimization strategy where we jointly leverage the 3D consistency of neural radiance fields together with the personalization capability of text-to-image models. Our method can produce high-quality, subject-specific 3D assets with text-driven modifications such as novel poses, colors and attributes that are not seen in any of the input images of the subject.

artificial intelligence, dreambooth3d, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.13508

Country: Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Meet Instruct-NeRF2NeRF: An AI Method For Editing 3D Scenes With Text-Instructions - MarkTechPost

#artificialintelligenceMar-26-2023, 13:00:07 GMT

It has never been simpler to capture a realistic digital representation of a real-world 3D scene, thanks to the development of effective neural 3D reconstruction techniques. They anticipate that because it is so user-friendly, recorded 3D content will progressively replace manually-generated components. While the pipelines for converting a real scene into a 3D representation are quite established and easily available, many of the additional tools required to develop 3D assets, such as those needed for editing 3D scenes, are still in their infancy. Traditionally, manually sculpting, extruding, and retexturing an item required specialized tools and years of skill when modifying 3D models. This process is significantly more complicated as neuronal representations frequently need explicit surfaces.

instruct-nerf2nerf, meet instruct-nerf2nerf, representation, (5 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback