illumination
FlareX: APhysics-Informed Dataset for Lens Flare Removal via 2DSynthesis and 3DRendering
Lens flare occurs when shooting towards strong light sources, significantly degrading the visual quality of images. Due to the difficulty in capturing flare-corrupted and flare-free image pairs in the real world, existing datasets are typically synthesized in 2D by overlaying artificial flare templates onto background images. However, the lack of flare diversity in templates and the neglect of physical principles in the synthesis process hinder models trained on these datasets from generalizing well to real-world scenarios. To address these challenges, we propose a new physics-informed method for flare data generation, which consists of three stages: parameterized template creation, the laws of illumination-aware 2D synthesis, and physical engine-based 3D rendering, which finally gives us a miXed flare dataset that incorporates both 2D and 3D perspectives, namely FlareX. This dataset offers 9,500 2D templates derived from 95 flare patterns and 3,000 flare image pairs rendered from 60 3D scenes. Furthermore, we design a masking approach to obtain real-world flare-free images from their corrupted counterparts to measure the performance of the model on real-world images. Extensive experiments demonstrate the effectiveness of our method and dataset.
BecomingLit: Relightable Gaussian Avatars with Hybrid Neural Shading
We introduce BecomingLit, a novel method for reconstructing relightable, highresolution head avatars that can be rendered from novel viewpoints at interactive rates. Therefore, we propose a new low-cost light stage capture setup, tailored specifically towards capturing faces. Using this setup, we collect a novel dataset consisting of diverse multi-view sequences of numerous subjects under varying illumination conditions and facial expressions. By leveraging our new dataset, we introduce a new relightable avatar representation based on 3DGaussian primitives that we animate with a parametric head model and an expression-dependent dynamics module. We propose a new hybrid neural shading approach, combining a neural diffuse BRDF with an analytical specular term. Our method reconstructs disentangled materials from our dynamic light stage recordings and enables allfrequency relighting of our avatars with both point lights and environment maps. In addition, our avatars can easily be animated and controlled from monocular videos. We validate our approach in extensive experiments on our dataset, where we consistently outperform existing state-of-the-art methods in relighting and reenactment by a significant margin.
Fast and Unified Image and Video with Physics Plausible Feedback
Relighting is a crucial task with both practical demand and artistic value, and recent diffusion models have shown strong potential by enabling rich and controllable lighting effects. However, as they are typically optimized in semantic latent space, where proximity does not guarantee physical correctness in visual space, they often produce unrealistic results--such as overexposed highlights, misaligned shadows, and incorrect occlusions. We address this with UniLumos, a unified relighting framework for both images and videos that brings RGB-space geometry feedback into a flow-matching backbone. By supervising the model with depth and normal maps extracted from its outputs, we explicitly align lighting effects with the scene structure, enhancing physical plausibility. Nevertheless, this feedback requires high-quality outputs for supervision in visual space, making standard multi-step denoising computationally expensive. To mitigate this, we employ path consistency learning, allowing supervision to remain effective even under few-step training regimes. To enable fine-grained relighting control and supervision, we design a structured six-dimensional annotation protocol capturing core illumination attributes. Building upon this, we propose LumosBench, a disentangled attributelevel benchmark that evaluates lighting controllability via large vision-language models, enabling automatic and interpretable assessment of relighting precision across individual dimensions. Extensive experiments demonstrate that UniLumos achieves state-of-the-art relighting quality with significantly improved physical consistency, while delivering a 20x speedup for both image and video relighting.
ROGR: Relightable 3DObjects using Generative Relighting
We introduce ROGR, a novel approach that reconstructs a relightable 3D model of an that object simulates captured the ef from fects multiple of placing vie the ws, object driven under by a no generati vel en v vironment e relighting illuminamodel tions. Our method samples the appearance of the object under multiple lighting environments, creating a dataset that is used to train a lighting-conditioned Neural environmental Radiance Field lighting.
283066055b0256ca8e3e0c8c96019357-Paper-Conference.pdf
By integrating the lighting, appearance, and geometry cues within a unified diffusion architecture, IllumiCraft generates temporally coherent videos aligned with user-defined prompts. It supports background-conditioned and text-conditioned video relighting and provides better fidelity than existing controllable video generation methods.
LuxDiT: Lighting Estimation with Video Diffusion Transformer
Estimating scene lighting from a single image or video remains a longstand-ing challenge in computer vision and graphics. Learning-based approaches areconstrained by the scarcity of ground-truth HDR environment maps, which areexpensive to capture and limited in diversity. While recent generative modelsoffer strong priors for image synthesis, lighting estimation remains difficult dueto its reliance on indirect visual cues, the need to infer global (non-local) con-text, and the recovery of high-dynamic-range outputs. We propose LuxDiT, anovel data-driven approach that fine-tunes a video diffusion transformer to gen-erate HDR environment maps conditioned on visual input. Trained on a largesynthetic dataset with diverse lighting conditions, our model learns to infer il-lumination from indirect visual cues and generalizes effectively to real-worldscenes. To improve semantic alignment between the input and the predicted environment map, we introduce a low-rank adaptation finetuning strategy using a collected dataset of HDR panoramas.
MaterialRefGS: Reflective Gaussian Splatting with Multi-view Consistent Material Inference
Modeling reflections from 2D images is essential for photorealistic rendering and novel view synthesis. Recent approaches enhance Gaussian primitives with reflection-related material attributes to enable physically based rendering (PBR) with Gaussian Splatting. However, the material inference often lacks sufficient constraints, especially under limited environment modeling, resulting in illumination aliasing and reduced generalization. In this work, we revisit the problem from a multi-view perspective and show that multi-view consistent material inference with more physically-based environment modeling is key to learning accurate reflections with Gaussian Splatting. To this end, we enforce 2D Gaussians to produce multi-view consistent material maps during deferred shading. We also track photometric variations across views to identify highly reflective regions, which serve as strong priors for reflection strength terms. To handle indirect illumination caused by inter-object occlusions, we further introduce an environment modeling strategy through ray tracing with 2DGS, enabling photorealistic rendering of indirect radiance. Experiments on widely used benchmarks show that our method faithfully recovers both illumination and geometry, achieving state-of-the-art rendering quality in novel views synthesis.
Stanford-ORB: AReal-World 3DObject Inverse Rendering Benchmark
We introduce Stanford-ORB, a new real-world 3DObject inverse Rendering Benchmark. Recent advances in inverse rendering have enabled a wide range of real-world applications in 3D content generation, moving rapidly from research and commercial use cases to consumer devices. While the results continue to improve, there is no real-world benchmark that can quantitatively assess and compare the performance of various inverse rendering methods. Existing real-world datasets typically consist only of the shape and multi-view images of objects, which are not sufficient for evaluating the quality of material recovery and object relighting. Methods capable of recovering material and lighting often resort to synthetic data for quantitative evaluation, which on the other hand does not guarantee generalization to complex real-world environments. We introduce a new dataset of real-world objects captured under a variety of natural scenes with ground-truth 3D scans, multi-view images, and environment lighting. Using this dataset, we establish the first comprehensive real-world evaluation benchmark for object inverse rendering tasks from in-thewild scenes and compare the performance of various existing methods. All data, code, and models can be accessed at https://stanfordorb.github.io/.
Supplementary Material for Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition
Our main reconstruction loss is an MSE between the rendered color c and the corresponding pixel in the input image. This loss is then exponentially faded over 100,000 steps to a cosine weighted MSE: (x ωo n ˆxωo n)2. This weighting tends to achieve better BRDF fitting results [4] as harsh grazing highlights from the Fresnel effect are not factored as much as regular samples, as well as our approximated rendering model being the least accurate in the grazing angles. The reason for this fading loss scheme is that the normals nare not reliable in the early stages of the training.