Learning Spatially-Aware Language and Audio Embeddings