Dynamic Focused Masking for Embodied Occupancy Prediction
–Neural Information Processing Systems
Visual autoregressive modeling has recently demonstrated potential in image tasks by enabling coarse-to-fine, next-level prediction. Most indoor 3D occupancy prediction methods, however, continue to rely on dense voxel grids and convolution-heavy backbones, which incur high computational costs when applying such coarse-tofine frameworks. In contrast, cost-efficient alternatives based on Gaussian representations--particularly in the context of multi-scale autoregression--remain underexplored. To bridge this gap, we propose DFGauss, a Dynamic Focused masking framework for multi-scale 3DGaussian representation. Unlike conventional approaches that refine voxel volumes or 2D projections, DFGauss directly operates in the 3DGaussian parameter space, progressively refining representations across resolutions under hierarchical supervision. Each finer-scale Gaussian is conditioned on its coarser-level counterpart, forming a scale-wise autoregressive process. To further enhance efficiency, we introduce an importance-guided refinement strategy that selectively propagates informative Gaussians across scales, enabling spatially adaptive detail modeling. Experiments on 3D occupancy benchmarks demonstrate that DFGauss achieves competitive performance, highlighting the promise of autoregressive modeling for scalable 3D occupancy prediction.
Neural Information Processing Systems
Jun-23-2026, 01:42:35 GMT
- Country:
- North America > United States > New Jersey (0.28)
- Genre:
- Research Report > Experimental Study (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Representation & Reasoning (1.00)
- Robots (0.68)
- Machine Learning > Neural Networks (0.68)
- Natural Language > Large Language Model (0.46)
- Information Technology > Artificial Intelligence