AIRwaves at CheckThat! 2025: Retrieving Scientific Sources for Implicit Claims on Social Media with Dual Encoders and Neural Re-Ranking

Ashbaugh, Cem, Baumgärtner, Leon, Gress, Tim, Sidorov, Nikita, Werner, Daniel

Sep-25-2025–arXiv.org Artificial Intelligence

Linking implicit scientific claims made on social media to their original publications is crucial for evidence-based fact-checking and scholarly discourse, yet it is hindered by lexical sparsity, very short queries, and domain-specific language. Team AIRwaves ranked second in Subtask 4b of the CLEF-2025 CheckThat! Lab with an evidence-retrieval approach that markedly outperforms the competition baseline. The optimized sparse-retrieval baseline(BM25) achieves MRR@5 = 0.5025 on the gold label blind test set. To surpass this baseline, a two-stage retrieval pipeline is introduced: (i) a first stage that uses a dual encoder based on E5-large, fine-tuned using in-batch and mined hard negatives and enhanced through chunked tokenization and rich document metadata; and (ii) a neural re-ranking stage using a SciBERT cross-encoder. Replacing purely lexical matching with neural representations lifts performance to MRR@5 = 0.6174, and the complete pipeline further improves to MRR@5 = 0.6828. The findings demonstrate that coupling dense retrieval with neural re-rankers delivers a powerful and efficient solution for tweet-to-study matching and provides a practical blueprint for future evidence-retrieval pipelines.

information retrieval, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

Sep-25-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.93)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study > Negative Result (0.64)

Industry:
- Health & Medicine (0.70)

Technology:
- Information Technology
  - Information Management (0.89)
  - Communications > Social Media (0.85)
  - Artificial Intelligence
    - Natural Language > Information Retrieval (0.68)
    - Representation & Reasoning (0.67)
    - Machine Learning
      - Neural Networks > Deep Learning (0.93)
      - Inductive Learning (0.69)
      - Supervised Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found