Unsupervised Fact Verification by Language Model Distillation

Bazaga, Adrián, Liò, Pietro, Micklem, Gos

Sep-28-2023–arXiv.org Machine Learning

Unsupervised fact verification aims to verify a claim using evidence from a trustworthy knowledge base without any kind of data annotation. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised F act V erification via Language Model Distillation), a novel unsupervised framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on the standard FEVER fact verification benchmark (+8% accuracy) with linear evaluation. In recent years, the issue of automated fact verification has gained considerable attention as the volume of potentially misleading and false claims rises (Guo et al., 2022), resulting in the development of fully automated methods for fact checking (see Thorne et al. (2018); Zubiaga et al. (2018); Guo et al. (2022); Vladika & Matthes (2023); Das et al. (2023) for recent surveys). Pioneering research in the field of Natural Language Processing (NLP) has led to the emergence of (large) language models (LMs) (e.g.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

Sep-28-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Los Angeles County (0.14)
  - New York (0.14)

Genre:
- Overview (0.68)
- Research Report (0.64)

Industry:
- Leisure & Entertainment (0.46)
- Media (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.93)
  - Natural Language > Large Language Model (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found