Generating Visual Spatial Description via Holistic 3D Scene Understanding