Audio Visual Segmentation Through Text Embeddings