Self-supervised cross-modality learning for uncertainty-aware object detection and recognition in applications which lack pre-labelled training data