Inferring spatial relations from textual descriptions of images