Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models