AISTAT lab system for DCASE2025 Task6: Language-based audio retrieval

Kim, Hyun Jun, Choi, Hyeong Yong, Lim, Changwon

Sep-23-2025–arXiv.org Artificial Intelligence

ABSTRACT This report presents the AIST A T team's submission to the lan guage-based audio retrieval task in DCASE 2025 Task 6. Our proposed system employs dual encoder architecture, where audi o and text modalities are encoded separately, and their repre senta-tions are aligned using contrastive learning. Additionally, we incorporat ed clustering to introduce an auxiliary classification task for fur ther fine-tuning. Our best single system achieved a mAP@16 of 46.62, wh ile an ensem-ble of four systems reached a mAP@16 of 48.83 on the Clotho development test split. Index T erms -- Audio-text retrieval, contrastive learning, knowledge distillation, topic modeling 1. INTRODUCTION DCASE 2025 Task 6 challenge [1] focuses on language-based au - dio retrieval, a task that requires retrieving audio record ings from a database that best matches a given textual query, and vice v ersa.

caption, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Sep-23-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (0.69)
  - Natural Language > Large Language Model (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found