Generating Natural Questions from Images for Multimodal Assistants