Disentangling speech from surroundings with neural embeddings