Explainable unsupervised multi-modal image registration using deep networks