Can representation learning for multimodal image registration be improved by supervision of intermediate layers?