GPU Memory Prediction for Multimodal Model Training