Paper Explained -- SeMask: Semantically Masked Transformers for Semantic Segmentation
Every time we deal with an image transformer network what we end up doing is the exact same thing: finetuning a pretrained backbone of the encoder part. This is the traditional approach, not just for the semantic segmentation task. However, not taking into account the semantic information of the image to solve this task may not be the optimal method especially if we are talking about semantic segmentation. The authors of this paper have addressed the above problem by proposing a new simple and effective framework that can incorporate the semantic information of the image into a pretrained hierarchical transformer-based backbone with the help of a semantic attention operation. The authors provide empirical evidence by integrating SeMask into Swin-Transformer.
Apr-1-2022, 16:50:23 GMT