Multimodal Differential Network for Visual Question Generation