Transformer-Based Named Entity Recognition for French Using Adversarial Adaptation to Similar Domain Corpora

Choudhry, Arjun, Gupta, Pankaj, Khatri, Inder, Gupta, Aaryan, Nicol, Maxime, Meurs, Marie-Jean, Vishwakarma, Dinesh Kumar

Dec-5-2022–arXiv.org Artificial Intelligence

Named Entity Recognition (NER) is an information extraction task where specific entities are extracted from unstructured text and labelled into predefined classes. While NER models for high-resource languages like English have seen notable performance gains due to improvements in model architectures and availability of large datasets, limited-resource languages like French still face a dearth of openly available, large, labelled datasets. Recent research works use adversarial adaptation frameworks for adapting NER models from high-resource domains to low-resource domains. These approaches have been used for high-resource languages, where robust language models are available. We utilize adversarial adaptation to enable models to learn better, generalized features by adapting them to large, unlabelled corpora for better performance on source test set. We propose a Transformer-based NER approach for French using adversarial adaptation to counter the lack of large, labelled NER datasets in French. We train transformer-based NER models on labelled source datasets and use larger corpora from similar or mixed domains as target sets for improved feature learning. Our proposed approach helps outsource wider domain and general feature knowledge from easily-available large, unlabelled corpora. While we limit our evaluation to French datasets and corpora, our approach can be applied to other languages too.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Dec-5-2022

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Quebec > Montreal (0.16)
- Europe
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.05)
  - Germany > Saxony
    - Leipzig (0.05)
- Asia > India
  - NCT
    - New Delhi (0.05)
    - Delhi (0.05)

Genre:
- Research Report (0.51)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.84)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found