Weakly-supervised Automated Audio Captioning via text only training

Open in new window