nerf
ROGR: Relightable 3DObjects using Generative Relighting
We introduce ROGR, a novel approach that reconstructs a relightable 3D model of an that object simulates captured the ef from fects multiple of placing vie the ws, object driven under by a no generati vel en v vironment e relighting illuminamodel tions. Our method samples the appearance of the object under multiple lighting environments, creating a dataset that is used to train a lighting-conditioned Neural environmental Radiance Field lighting.
Can NeRFs "See" without Cameras?
Neural Radiance Fields (NeRFs) have been remarkably successful at synthesizing novel views of 3D scenes by optimizing a volumetric scene function. This scene function models how optical rays bring color information from a 3D object to the camera pixels. Radio frequency (RF) or audio signals can also be viewed as a vehicle for delivering information about the environment to a sensor. However, unlike camera pixels, an RF/audio sensor receives a mixture of signals that contain many environmental reflections (also called "multipath"). Is it still possible to infer the environment using such multipath signals? We show that with redesign, NeRFs can be taught to learn from multipath signals, and thereby "see" the environment. As a grounding application, we aim to infer the indoor floorplan of a home from sparse WiFi measurements made at multiple locations inside the home. Although a difficult inverse problem, our implicitly learnt floorplans look promising, and enables forward applications, such as indoor signal prediction and basic ray tracing.
Multimodal LiDAR-Camera Novel View Synthesis with Unified Pose-free Neural Fields
Pose-free Neural Radiance Field (NeRF) aims at novel view synthesis (NVS) without relying on accurate poses, exhibiting significant practical value. Image and LiDAR point cloud are two pivotal modalities in autonomous driving scenarios. While demonstrating impressive performance, single-modality pose-free NeRFs often suffer from local optima due to the limited geometric information provided by dense image textures or the sparse, textureless nature of point clouds. Although prior methods have explored the complementary strengths of both modalities, they have only leveraged inherently sparse point clouds for discrete, non-pixel-wise depth supervision, and are limited to NVS of images. As a result, a Multimodal Unified Pose-free framework remains notably absent.
KaRF: Weakly-Supervised Kolmogorov-Arnold Networks-based Radiance Fields for Local Color Editing
Recent advancements have suggested that neural radiance fields (NeRFs) show great potential in color editing within the 3D domain. However, most existing NeRF-based editing methods continue to face significant challenges in local region editing, which usually lead to imprecise local object boundaries, difficulties in maintaining multi-view consistency, and over-reliance on annotated data. To address these limitations, in this paper, we propose a novel weakly-supervised method called KaRF for local color editing, which facilitates high-fidelity and realistic appearance edits in arbitrary regions of 3D scenes. At the core of the proposed KaRF approach is a unified two-stage Kolmogorov-Arnold Networks (KANs)-based radiance fields framework, comprising a segmentation stage followed by a local recoloring stage. This architecture seamlessly integrates geometric priors from NeRF to achieve weakly-supervised learning, leading to superior performance. More specifically, we propose a residual adaptive gating KAN structure, which integrates KAN with residual connections, adaptive parameters, and gating mechanisms to effectively enhance segmentation accuracy and refine specific editing effects. Additionally, we propose a palette-adaptive reconstruction loss, which can enhance the accuracy of additive mixing results. Extensive experiments demonstrate that the proposed KaRF algorithm significantly outperforms many state-of-the-art methods both qualitatively and quantitatively. Our code and more results are available at: https://github.com/PaiDii/KARF.git.
NeRF-IBVS: Visual Servo Based on NeRF for Visual Localization and Navigation
Visual localization is a fundamental task in computer vision and robotics. Training existing visual localization methods requires a large number of posed images to generalize to novel views, while state-of-the-art methods generally require ground truth 3D labels for supervision. However, acquiring a large number of posed images and 3D labels in the real world is challenging and costly. In this paper, we present a novel visual localization method that achieves accurate localization while using only a few posed images compared to other localization methods. To achieve this, we first use a few posed images with coarse pseudo-3D labels provided by NeRF to train a coordinate regression network.
MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps
The key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection. Previous method relies on NeRF for geometry reasoning. However, the geometry extracted from NeRF is generally inaccurate, which leads to sub-optimal detection performance. In this paper, we propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection. To circumvent the requirement for a large number of depth planes for accurate depth prediction, we design a probabilistic sampling and soft weighting mechanism to decide the placement of pixel features on the 3D volume. We select multiple locations that score top in the probability volume for each pixel and use their probability score to indicate the confidence. We further apply recent pixel-aligned Gaussian Splatting to regularize depth prediction and improve detection performance with little computation overhead. Extensive experiments on ScanNet and ARKitScenes datasets are conducted to show the superiority of our model.
Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections
Photographs captured in unstructured tourist environments frequently exhibit variable appearances and transient occlusions, challenging accurate scene reconstruction and inducing artifacts in novel view synthesis. Although prior approaches have integrated the Neural Radiance Field (NeRF) with additional learnable modules to handle the dynamic appearances and eliminate transient objects, their extensive training demands and slow rendering speeds limit practical deployments. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising alternative to NeRF, offering superior training and inference efficiency along with better rendering quality. This paper presents \textit{Wild-GS}, an innovative adaptation of 3DGS optimized for unconstrained photo collections while preserving its efficiency benefits.
GeoNLF: Geometry guided Pose-Free Neural LiDAR Fields
Although recent efforts have extended Neural Radiance Field (NeRF) into LiDAR point cloud synthesis, the majority of existing works exhibit a strong dependence on precomputed poses. However, point cloud registration methods struggle to achieve precise global pose estimation, whereas previous pose-free NeRFs overlook geometric consistency in global reconstruction. In light of this, we explore the geometric insights of point clouds, which provide explicit registration priors for reconstruction. Based on this, we propose Geometry guided Neural LiDAR Fields (GeoNLF), a hybrid framework performing alternately global neural reconstruction and pure geometric pose optimization. Furthermore, NeRFs tend to overfit individual frames and easily get stuck in local minima under sparse-view inputs. To tackle this issue, we develop a selective-reweighting strategy and introduce geometric constraints for robust optimization. Extensive experiments on NuScenes and KITTI-360 datasets demonstrate the superiority of GeoNLF in both novel view synthesis and multi-view registration of low-frequency large-scale point clouds.