Intrinsic Explainability of Multimodal Learning for Crop Yield Prediction