AISTAT lab system for DCASE2025 Task6: Language-based audio retrieval

Kim, Hyun Jun, Choi, Hyeong Yong, Lim, Changwon

arXiv.org Artificial Intelligence 

ABSTRACT This report presents the AIST A T team's submission to the lan guage-based audio retrieval task in DCASE 2025 Task 6. Our proposed system employs dual encoder architecture, where audi o and text modalities are encoded separately, and their repre senta-tions are aligned using contrastive learning. Additionally, we incorporat ed clustering to introduce an auxiliary classification task for fur ther fine-tuning. Our best single system achieved a mAP@16 of 46.62, wh ile an ensem-ble of four systems reached a mAP@16 of 48.83 on the Clotho development test split. Index T erms -- Audio-text retrieval, contrastive learning, knowledge distillation, topic modeling 1. INTRODUCTION DCASE 2025 Task 6 challenge [1] focuses on language-based au - dio retrieval, a task that requires retrieving audio record ings from a database that best matches a given textual query, and vice v ersa.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found