Can Transformers Capture Spatial Relations between Objects?