LuxDiT: Lighting Estimation with Video Diffusion Transformer

Jun-14-2026, 12:12:23 GMT–Neural Information Processing Systems

Estimating scene lighting from a single image or video remains a longstand-ing challenge in computer vision and graphics. Learning-based approaches areconstrained by the scarcity of ground-truth HDR environment maps, which areexpensive to capture and limited in diversity. While recent generative modelsoffer strong priors for image synthesis, lighting estimation remains difficult dueto its reliance on indirect visual cues, the need to infer global (non-local) con-text, and the recovery of high-dynamic-range outputs. We propose LuxDiT, anovel data-driven approach that fine-tunes a video diffusion transformer to gen-erate HDR environment maps conditioned on visual input. Trained on a largesynthetic dataset with diverse lighting conditions, our model learns to infer il-lumination from indirect visual cues and generalizes effectively to real-worldscenes. To improve semantic alignment between the input and the predicted environment map, we introduce a low-rank adaptation finetuning strategy using a collected dataset of HDR panoramas.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Jun-14-2026, 12:12:23 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Media (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found