SEME at SemEval-2024 Task 2: Comparing Masked and Generative Language Models on Natural Language Inference for Clinical Trials

Aguiar, Mathilde, Zweigenbaum, Pierre, Naderi, Nona

Apr-5-2024–arXiv.org Artificial Intelligence

This paper describes our submission to Task 2 of SemEval-2024: Safe Biomedical Natural Language Inference for Clinical Trials. The Multi-evidence Natural Language Inference for Clinical Trial Data (NLI4CT) consists of a Textual Entailment (TE) task focused on the evaluation of the consistency and faithfulness of Natural Language Inference (NLI) models applied to Clinical Trial Reports (CTR). We test 2 distinct approaches, one based on finetuning and ensembling Masked Language Models and the other based on prompting Large Language Models using templates, in particular, using Chain-Of-Thought and Contrastive Chain-Of-Thought. Prompting Flan-T5-large in a 2-shot setting leads to our best system that achieves 0.57 F1 score, 0.64 Faithfulness, and 0.56 Consistency.

consistency, demonstration, language model, (14 more...)

arXiv.org Artificial Intelligence

Apr-5-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - United States (0.68)
  - Canada > Ontario
    - Toronto (0.05)
- Europe
  - France (0.04)
  - Iceland > Capital Region
    - Reykjavik (0.04)
- Asia
  - Singapore (0.04)
  - China > Hong Kong (0.04)

Genre:
- Research Report > Experimental Study (1.00)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Government > Regional Government
  - North America Government > United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found