Weakly-supervised Automated Audio Captioning via text only training