Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation