Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections

Neural Information Processing Systems 

Photographs captured in unstructured tourist environments frequently exhibit variable appearances and transient occlusions, challenging accurate scene reconstruction and inducing artifacts in novel view synthesis. Although prior approaches have integrated the Neural Radiance Field (NeRF) with additional learnable modules to handle the dynamic appearances and eliminate transient objects, their extensive training demands and slow rendering speeds limit practical deployments. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising alternative to NeRF, offering superior training and inference efficiency along with better rendering quality. This paper presents \textit{Wild-GS}, an innovative adaptation of 3DGS optimized for unconstrained photo collections while preserving its efficiency benefits. Unlike previous methods that model reference features in image space, \textit{Wild-GS} explicitly aligns the pixel appearance features to the corresponding local Gaussians by sampling the triplane extracted from the reference image.