Multimodal Contextualized Plan Prediction for Embodied Task Completion