From Raw Features to Effective Embeddings: A Three-Stage Approach for Multimodal Recipe Recommendation