Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval

Open in new window