Goto

Collaborating Authors

 Oceania



SurDis: ASurface Discontinuity Dataset for Wearable Technology to Assist Blind Navigation in Urban Environments

Neural Information Processing Systems

According to World Health Organization, there is an estimated 2.2 billion people with a near or distance vision impairment worldwide. Difficulty in self-navigation is one of the greatest challenges to independence for the blind and low vision (BLV) people. Through consultations with several BLV service providers, we realized that negotiating surface discontinuities is one of the very prominent challenges when navigating an outdoor environment within the urban. Surface discontinuities are commonly formed by rises and drop-offs along a pathway. They could be a threat to balancing during a walk and perceiving such a threat is highly challenging to the BLVs.








Dynamic Encoder for Vision Transformers

Neural Information Processing Systems

The budget for DGE is set to 0.5. "Resolution" refers to the side length of input images. As shown in Figure 1(a), one limitation of our work is that the acceleration ratio on GPUs (based on native PyTorch implementation) is not good when the input image size is small. We suspect that this is due to the additional modules of DGE resulting in more scheduling processes, and scheduling processes lead to static time consumption. Nevertheless, our work demonstrates the superiority of efficiency on large-size input images, which is crucial for many downstream tasks and practical scenes.


Supplementary Materials Shape Registration in the Time of Transformers

Neural Information Processing Systems

In this section, we describe in detail the proposed architecture and its implementation. Our architecture is composed by an encoder and a decoder. The encoder receives as input a predefined number of learnable latent probes LP, together with the point coordinates of the target point cloud XT. Each layer of the encoder performs an operation of cross-attention between LP and XT followed by a self-attention on LP. Each attention is followed by a feed-forward layer.