Can Transformers Capture Spatial Relations between Objects?

Open in new window