Supplementary Material for WT-MVSNet: Window-based Transformers for Multi-view Stereo
–Neural Information Processing Systems
Due to Transformer limited by input image resolution, our proposed Window-based Epipolar Transformer (WET) only performs on features of 1/4 raw resolution. In inter-attention module, to better divide the corresponding windows of source feature, we iterate at 1/4 resolution twice. The first iteration is to estimate depth map without WET. In the following iteration, we utilize the depth values to warp the center pixels of reference feature windows in order to partition corresponding windows in source features. To train WET with supervision at all stages, we design a transformed feature pathway that interpolates the feature map processed by WET to 1/2 and full resolution and then adds to the corresponding feature map at the next stage.
Neural Information Processing Systems
Jan-26-2025, 05:24:05 GMT
- Technology: