Multimodal Punctuation Prediction with Contextual Dropout