AITopics | Luan, Fujun

Collaborating Authors

Luan, Fujun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EP-CFG: Energy-Preserving Classifier-Free Guidance

Zhang, Kai, Luan, Fujun, Bi, Sai, Zhang, Jianming

arXiv.org Artificial IntelligenceDec-13-2024

Classifier-free guidance (CFG) (Ho & Salimans, 2022) is widely used in diffusion models (Ho et al., 2020; Song et al., 2020) for text-guided generation, but often leads to over-contrast and oversaturation artifacts. We propose EP-CFG, a simple yet effective CFG solution that preserves the energy distribution of the conditional prediction while maintaining strong semantic alignment. Usually, the CFG strength is around 7-10 (Rombach et al., 2022) in modern text-to-image models for sampling high-quality visuals. However, it is wellknown that such high CFG strength can lead to the well-known over-contrast and over-saturation artifacts (Ho & Salimans, 2022). Concurrent work (Sadat et al., 2024) proposed APG to address oversaturation through update term decomposition.

artificial intelligence, cfg level, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2412.09966

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.90)
Information Technology > Artificial Intelligence > Vision (0.67)

Add feedback

RandAR: Decoder-only Autoregressive Visual Generation in Random Orders

Pang, Ziqi, Zhang, Tianyuan, Luan, Fujun, Man, Yunze, Tan, Hao, Zhang, Kai, Freeman, William T., Wang, Yu-Xiong

arXiv.org Artificial IntelligenceDec-2-2024

We introduce RandAR, a decoder-only visual autoregressive (AR) model capable of generating images in arbitrary token orders. Unlike previous decoder-only AR models that rely on a predefined generation order, RandAR removes this inductive bias, unlocking new capabilities in decoder-only generation. Our essential design enables random order by inserting a "position instruction token" before each image token to be predicted, representing the spatial location of the next image token. Trained on randomly permuted token sequences -- a more challenging task than fixed-order generation, RandAR achieves comparable performance to its conventional raster-order counterpart. More importantly, decoder-only transformers trained from random orders acquire new capabilities. For the efficiency bottleneck of AR models, RandAR adopts parallel decoding with KV-Cache at inference time, enjoying 2.5x acceleration without sacrificing generation quality. Additionally, RandAR supports inpainting, outpainting and resolution extrapolation in a zero-shot manner. We hope RandAR inspires new directions for decoder-only visual generation models and broadens their applications across diverse scenarios. Our project page is at https://rand-ar.github.io/.

large language model, machine learning, randar, (17 more...)

arXiv.org Artificial Intelligence

2412.01827

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

Kuang, Zhengfei, Zhang, Tianyuan, Zhang, Kai, Tan, Hao, Bi, Sai, Hu, Yiwei, Xu, Zexiang, Hasan, Milos, Wetzstein, Gordon, Luan, Fujun

arXiv.org Artificial IntelligenceNov-26-2024

We present Buffer Anytime, a framework for estimation of depth and normal maps (which we call geometric buffers) from video that eliminates the need for paired video--depth and video--normal training data. Instead of relying on large-scale annotated video datasets, we demonstrate high-quality video buffer estimation by leveraging single-image priors with temporal consistency constraints. Our zero-shot training strategy combines state-of-the-art image estimation models based on optical flow smoothness through a hybrid loss function, implemented via a lightweight temporal attention architecture. Applied to leading image models like Depth Anything V2 and Marigold-E2E-FT, our approach significantly improves temporal consistency while maintaining accuracy. Experiments show that our method not only outperforms image-based approaches but also achieves results comparable to state-of-the-art video models trained on large-scale paired video datasets, despite using no such paired video data.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.17249

Country:

North America > United States > Massachusetts (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

Jin, Haian, Jiang, Hanwen, Tan, Hao, Zhang, Kai, Bi, Sai, Zhang, Tianyuan, Luan, Fujun, Snavely, Noah, Xu, Zexiang

arXiv.org Artificial IntelligenceOct-22-2024

We propose the Large View Synthesis Model (LVSM), a novel transformer-based approach for scalable and generalizable novel view synthesis from sparse-view inputs. We introduce two architectures: (1) an encoder-decoder LVSM, which encodes input image tokens into a fixed number of 1D latent tokens, functioning as a fully learned scene representation, and decodes novel-view images from them; and (2) a decoder-only LVSM, which directly maps input images to novelview outputs, completely eliminating intermediate scene representations. Both models bypass the 3D inductive biases used in previous methods--from 3D representations (e.g., NeRF, 3DGS) to network designs (e.g., epipolar projections, plane sweeps)--addressing novel view synthesis with a fully data-driven approach. While the encoder-decoder model offers faster inference due to its independent latent representation, the decoder-only LVSM achieves superior quality, scalability, and zero-shot generalization, outperforming previous state-of-the-art methods by 1.5 to 3.5 dB PSNR. Comprehensive evaluations across multiple datasets demonstrate that both LVSM variants achieve state-of-the-art novel view synthesis quality. Notably, our models surpass all previous methods even with reduced computational resources (1-2 GPUs). Novel view synthesis is a long-standing challenge in vision and graphics. For decades, the community has generally relied on various 3D inductive biases, incorporating 3D priors and handcrafted structures to simplify the task and improve synthesis quality. Other methods have also built generalizable networks to estimate these representations or directly generate novel-view images in a feed-forward manner, often incorporating additional 3D inductive biases, such as projective epipolar lines or plane-sweep volumes, in their architecture designs (Wang et al., 2021a; Yu et al., 2021; Chen et al., 2021; Suhail et al., 2022b; Charatan et al., 2024; Chen et al., 2024). While effective, these 3D inductive biases inherently limit model flexibility, constraining their adaptability to more diverse and complex scenarios that do not align with predefined priors or handcrafted structures.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.17242

Country: North America > United States (0.93)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

RelitLRM: Generative Relightable Radiance for Large Reconstruction Models

Zhang, Tianyuan, Kuang, Zhengfei, Jin, Haian, Xu, Zexiang, Bi, Sai, Tan, Hao, Zhang, He, Hu, Yiwei, Hasan, Milos, Freeman, William T., Zhang, Kai, Luan, Fujun

arXiv.org Artificial IntelligenceOct-10-2024

We propose RelitLRM, a Large Reconstruction Model (LRM) for generating high-quality Gaussian splatting representations of 3D objects under novel illuminations from sparse (4-8) posed images captured under unknown static lighting. Unlike prior inverse rendering methods requiring dense captures and slow optimization, often causing artifacts like incorrect highlights or shadow baking, RelitLRM adopts a feed-forward transformer-based model with a novel combination of a geometry reconstructor and a relightable appearance generator based on diffusion. The model is trained end-to-end on synthetic multi-view renderings of objects under varying known illuminations. This architecture design enables to effectively decompose geometry and appearance, resolve the ambiguity between material and lighting, and capture the multi-modal distribution of shadows and specularity in the relit appearance. We show our sparse-view feed-forward RelitLRM offers competitive relighting results to state-of-the-art dense-view optimization-based baselines while being significantly faster. Our project page is available at: https://relit-lrm.github.io/.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.06231

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Neural Gaffer: Relighting Any Object via Diffusion

Jin, Haian, Li, Yuan, Luan, Fujun, Xiangli, Yuanbo, Bi, Sai, Zhang, Kai, Xu, Zexiang, Sun, Jin, Snavely, Noah

arXiv.org Artificial IntelligenceJun-11-2024

Single-image relighting is a challenging task that involves reasoning about the complex interplay between geometry, materials, and lighting. Many prior methods either support only specific categories of images, such as portraits, or require special capture conditions, like using a flashlight. Alternatively, some methods explicitly decompose a scene into intrinsic components, such as normals and BRDFs, which can be inaccurate or under-expressive. In this work, we propose a novel end-to-end 2D relighting diffusion model, called Neural Gaffer, that takes a single image of any object and can synthesize an accurate, high-quality relit image under any novel environmental lighting condition, simply by conditioning an image generator on a target environment map, without an explicit scene decomposition. Our method builds on a pre-trained diffusion model, and fine-tunes it on a synthetic relighting dataset, revealing and harnessing the inherent understanding of lighting present in the diffusion model. We evaluate our model on both synthetic and in-the-wild Internet imagery and demonstrate its advantages in terms of generalization and accuracy. Moreover, by combining with other generative methods, our model enables many downstream 2D tasks, such as text-based relighting and object insertion. Our model can also operate as a strong relighting prior for 3D tasks, such as relighting a radiance field.

artificial intelligence, diffusion model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2406.0752

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

I$^2$-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs

Zhu, Jingsen, Huo, Yuchi, Ye, Qi, Luan, Fujun, Li, Jifan, Xi, Dianbing, Wang, Lisha, Tang, Rui, Hua, Wei, Bao, Hujun, Wang, Rui

arXiv.org Artificial IntelligenceMar-29-2023

In this work, we present I$^2$-SDF, a new method for intrinsic indoor scene reconstruction and editing using differentiable Monte Carlo raytracing on neural signed distance fields (SDFs). Our holistic neural SDF-based framework jointly recovers the underlying shapes, incident radiance and materials from multi-view images. We introduce a novel bubble loss for fine-grained small objects and error-guided adaptive sampling scheme to largely improve the reconstruction quality on large-scale indoor scenes. Further, we propose to decompose the neural radiance field into spatially-varying material of the scene as a neural field through surface-based, differentiable Monte Carlo raytracing and emitter semantic segmentations, which enables physically based and photorealistic scene relighting and editing applications. Through a number of qualitative and quantitative experiments, we demonstrate the superior quality of our method on indoor scene reconstruction, novel view synthesis, and scene editing compared to state-of-the-art baselines.

artificial intelligence, machine learning, reconstruction, (18 more...)

arXiv.org Artificial Intelligence

2303.07634

Country: Asia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Learning-based Inverse Rendering of Complex Indoor Scenes with Differentiable Monte Carlo Raytracing

Zhu, Jingsen, Luan, Fujun, Huo, Yuchi, Lin, Zihao, Zhong, Zhihua, Xi, Dianbing, Zheng, Jiaxiang, Tang, Rui, Bao, Hujun, Wang, Rui

arXiv.org Artificial IntelligenceNov-23-2022

Indoor scenes typically exhibit complex, spatially-varying appearance from global illumination, making inverse rendering a challenging ill-posed problem. This work presents an end-to-end, learning-based inverse rendering framework incorporating differentiable Monte Carlo raytracing with importance sampling. The framework takes a single image as input to jointly recover the underlying geometry, spatially-varying lighting, and photorealistic materials. Specifically, we introduce a physically-based differentiable rendering layer with screen-space ray tracing, resulting in more realistic specular reflections that match the input photo. In addition, we create a large-scale, photorealistic indoor scene dataset with significantly richer details like complex furniture and dedicated decorations. Further, we design a novel out-of-view lighting network with uncertainty-aware refinement leveraging hypernetwork-based neural radiance fields to predict lighting outside the view of the input photo. Through extensive evaluations on common benchmark datasets, we demonstrate superior inverse rendering quality of our method compared to state-of-the-art baselines, enabling various applications such as complex object insertion and material editing with high fidelity. Code and data will be made available at \url{https://jingsenzhu.github.io/invrend}.

artificial intelligence, dataset, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2211.03017

Country:

North America > United States (0.30)
Asia (0.29)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback