Computational Thresholds in Multi-Modal Learning via the Spiked Matrix-Tensor Model