novel view image
DreamSparse: Escaping from Plato's Cave with 2D Diffusion Model Given Sparse Views
Synthesizing novel view images from a few views is a challenging but practical problem. Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings due to the insufficient information provided. In this work, we explore leveraging the strong 2D priors in pre-trained diffusion models for synthesizing novel view images.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Oklahoma > Beaver County (0.04)
- (2 more...)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
DreamSparse: Escaping from Plato's Cave with 2D Diffusion Model Given Sparse Views
Synthesizing novel view images from a few views is a challenging but practical problem. Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings due to the insufficient information provided. In this work, we explore leveraging the strong 2D priors in pre-trained diffusion models for synthesizing novel view images. To address these problems, we propose \textit{DreamSparse}, a framework that enables the frozen pre-trained diffusion model to generate geometry and identity-consistent novel view images. Specifically, DreamSparse incorporates a geometry module designed to capture features about spatial information from sparse views as a 3D prior. Subsequently, a spatial guidance model is introduced to convert rendered feature maps as spatial information for the generative process.
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement
He, Xu, Li, Xiaoyu, Kang, Di, Ye, Jiangnan, Zhang, Chaopeng, Chen, Liyang, Gao, Xiangjun, Zhang, Han, Wu, Zhiyong, Zhuang, Haolin
Existing works in single-image human reconstruction suffer from weak generalizability due to insufficient training data or 3D inconsistencies for a lack of comprehensive multi-view knowledge. In this paper, we introduce MagicMan, a human-specific multi-view diffusion model designed to generate high-quality novel view images from a single reference image. As its core, we leverage a pre-trained 2D diffusion model as the generative prior for generalizability, with the parametric SMPL-X model as the 3D body prior to promote 3D awareness. To tackle the critical challenge of maintaining consistency while achieving dense multi-view generation for improved 3D human reconstruction, we first introduce hybrid multi-view attention to facilitate both efficient and thorough information interchange across different views. Additionally, we present a geometry-aware dual branch to perform concurrent generation in both RGB and normal domains, further enhancing consistency via geometry cues. Last but not least, to address ill-shaped issues arising from inaccurate SMPL-X estimation that conflicts with the reference image, we propose a novel iterative refinement strategy, which progressively optimizes SMPL-X accuracy while enhancing the quality and consistency of the generated multi-views. Extensive experimental results demonstrate that our method significantly outperforms existing approaches in both novel view synthesis and subsequent 3D human reconstruction tasks.
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Asia > China > Hong Kong (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (3 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding
Chou, Zi-Ting, Huang, Sheng-Yu, Liu, I-Jieh, Wang, Yu-Chiang Frank
Utilizing multi-view inputs to synthesize novel-view images, Neural Radiance Fields (NeRF) have emerged as a popular research topic in 3D vision. In this work, we introduce a Generalizable Semantic Neural Radiance Field (GSNeRF), which uniquely takes image semantics into the synthesis process so that both novel view images and the associated semantic maps can be produced for unseen scenes. Our GSNeRF is composed of two stages: Semantic Geo-Reasoning and Depth-Guided Visual rendering. The former is able to observe multi-view image inputs to extract semantic and geometry features from a scene. Guided by the resulting image geometry information, the latter performs both image and semantic rendering with improved performances. Our experiments not only confirm that GSNeRF performs favorably against prior works on both novel-view image and semantic segmentation synthesis but the effectiveness of our sampling strategy for visual rendering is further verified.
DreamSparse: Escaping from Plato's Cave with 2D Frozen Diffusion Model Given Sparse Views
Yoo, Paul, Guo, Jiaxian, Matsuo, Yutaka, Gu, Shixiang Shane
Synthesizing novel view images from a few views is a challenging but practical problem. Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings due to the insufficient information provided. In this work, we explore leveraging the strong 2D priors in pre-trained diffusion models for synthesizing novel view images. 2D diffusion models, nevertheless, lack 3D awareness, leading to distorted image synthesis and compromising the identity. To address these problems, we propose DreamSparse, a framework that enables the frozen pre-trained diffusion model to generate geometry and identity-consistent novel view image. Specifically, DreamSparse incorporates a geometry module designed to capture 3D features from sparse views as a 3D prior. Subsequently, a spatial guidance model is introduced to convert these 3D feature maps into spatial information for the generative process. This information is then used to guide the pre-trained diffusion model, enabling it to generate geometrically consistent images without tuning it. Leveraging the strong image priors in the pre-trained diffusion models, DreamSparse is capable of synthesizing high-quality novel views for both object and scene-level images and generalising to open-set images. Experimental results demonstrate that our framework can effectively synthesize novel view images from sparse views and outperforms baselines in both trained and open-set category images. More results can be found on our project page: https://sites.google.com/view/dreamsparse-webpage.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- (3 more...)
New machine-learning approach brings digital photos back to life
Every day, billions of photos and videos are posted to various social media applications. The problem with standard images taken by a smartphone or digital camera is that they only capture a scene from a specific point of view. But looking at it in reality, we can move around and observe it from different viewpoints. Computer scientists are working to provide an immersive experience for the users that would allow them to observe a scene from different viewpoints, but it requires specialized camera equipment that is not readily accessible to the average person. To make the process easier, Dr. Nima Kalantari, professor in the Department of Computer Science and Engineering at Texas A&M University, and graduate student Qinbo Li have developed a machine-learning-based approach that would allow users to take a single photo and use it to generate novel views of the scene.