Automated Audio Captioning via Fusion of Low- and High- Dimensional Features

Open in new window