Pseudo-Siamese Blind-Spot Transformers for Self-Supervised Real-World Denoising

Neural Information Processing Systems 

Real-world image denoising remains a challenge task. This paper studies selfsupervised image denoising, requiring only noisy images captured in a single shot. We revamping the blind-spot technique by leveraging the transformer's capability for long-range pixel interactions, which is crucial for effectively removing noise dependence in relating pixel-a requirement for achieving great performance for the blind-spot technique. The proposed method integrates these elements with two key innovations: a directional self-attention (DSA) module using a halfplane grid for self-attention, creating a sophisticated blind-spot structure, and a Siamese architecture with mutual learning to mitigate the performance impacts from the restricted attention grid in DSA. Experiments on benchmark datasets demonstrate that our method outperforms existing self-supervised and clean-imagefree methods. This combination of blind-spot and transformer techniques provides a natural synergy for tackling real-world image denoising challenges.