Can representation learning for multimodal image registration be improved by supervision of intermediate layers?

Open in new window