TartuNLP at SemEval-2025 Task 5: Subject Tagging as Two-Stage Information Retrieval
Dorkin, Aleksei, Sirts, Kairit
–arXiv.org Artificial Intelligence
We present our submission to the Task 5 of SemEval-2025 that aims to aid librarians in assigning subject tags to the library records by producing a list of likely relevant tags for a given document. We frame the task as an information retrieval problem, where the document content is used to retrieve subject tags from a large subject taxonomy. We leverage two types of encoder models to build a two-stage information retrieval system -- a bi-encoder for coarse-grained candidate extraction at the first stage, and a cross-encoder for fine-grained re-ranking at the second stage. This approach proved effective, demonstrating significant improvements in recall compared to single-stage methods and showing competitive results according to qualitative evaluation.
arXiv.org Artificial Intelligence
May-1-2025
- Country:
- Asia (0.69)
- North America > Mexico
- Mexico City (0.14)
- Europe > Austria
- Vienna (0.14)
- Genre:
- Research Report (0.50)
- Technology: