A Transformer-based Audio Captioning Model with Keyword Estimation

Open in new window