Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data Augmentation
Guimarães, Artur, Martins, Bruno, Magalhães, João
–arXiv.org Artificial Intelligence
Language Processing (NLP) tasks, including in the Our overall best submission to the task achieved assessment of textual entailment relations. However, a macro F1-score of 0.80 (1st place on the leaderboard), these models are heavily susceptible to shortcut a consistency score of 0.72 (15th), and a learning (Du et al., 2023), factual inconsistency faithfulness score of 0.83 (11th). Our method excels (Xie et al., 2023), and performance degradation in classification accuracy, but fails at being when exposed to data from specialized domains, robust to perturbations on the statements, i.e. predicting such as in the case of medical data.
arXiv.org Artificial Intelligence
Aug-6-2024
- Country:
- North America > United States (0.04)
- Europe > Portugal
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Technology: