Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions