Transformer-Based Named Entity Recognition for French Using Adversarial Adaptation to Similar Domain Corpora

Choudhry, Arjun, Gupta, Pankaj, Khatri, Inder, Gupta, Aaryan, Nicol, Maxime, Meurs, Marie-Jean, Vishwakarma, Dinesh Kumar

arXiv.org Artificial Intelligence 

Named Entity Recognition (NER) is an information extraction task where specific entities are extracted from unstructured text and labelled into predefined classes. While NER models for high-resource languages like English have seen notable performance gains due to improvements in model architectures and availability of large datasets, limited-resource languages like French still face a dearth of openly available, large, labelled datasets. Recent research works use adversarial adaptation frameworks for adapting NER models from high-resource domains to low-resource domains. These approaches have been used for high-resource languages, where robust language models are available. We utilize adversarial adaptation to enable models to learn better, generalized features by adapting them to large, unlabelled corpora for better performance on source test set. We propose a Transformer-based NER approach for French using adversarial adaptation to counter the lack of large, labelled NER datasets in French. We train transformer-based NER models on labelled source datasets and use larger corpora from similar or mixed domains as target sets for improved feature learning. Our proposed approach helps outsource wider domain and general feature knowledge from easily-available large, unlabelled corpora. While we limit our evaluation to French datasets and corpora, our approach can be applied to other languages too.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found