Do Joint Language-Audio Embeddings Encode Perceptual Timbre Semantics?