Simple Questions Generate Named Entity Recognition Datasets

Kim, Hyunjae, Yoo, Jaehyo, Yoon, Seunghyun, Lee, Jinhyuk, Kang, Jaewoo

Nov-5-2022–arXiv.org Artificial Intelligence

Recent named entity recognition (NER) models often rely on human-annotated datasets, requiring the significant engagement of professional knowledge on the target domain and entities. This research introduces an ask-to-generate approach that automatically generates NER datasets by asking questions in simple natural language to an open-domain question answering system (e.g., "Which disease?"). Despite using fewer in-domain resources, our models, solely trained on the generated datasets, largely outperform strong low-resource models by an average F1 score of 19.4 for six popular NER benchmarks. Furthermore, our models provide competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by an F1 score of 5.2 on three benchmarks and achieve new state-of-the-art performance.

artificial intelligence, information retrieval, natural language, (17 more...)

arXiv.org Artificial Intelligence

Nov-5-2022

arXiv.org PDF

Add feedback

Country:
- Africa > South Africa (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - New York (0.04)
    - Florida > Orange County (0.04)
    - Arizona (0.04)
    - New Mexico > Santa Fe County
      - Santa Fe (0.04)
    - Illinois > Cook County
      - Chicago (0.04)
    - California > San Diego County
      - San Diego (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Italy (0.04)
  - United Kingdom > England (0.04)
  - Sweden (0.04)
  - Russia (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - France > Provence-Alpes-Côte d'Azur
    - Bouches-du-Rhône > Marseille (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Singapore (0.04)
  - China (0.04)
  - Russia (0.04)
  - Middle East > Iraq (0.04)
  - India > Goa (0.04)
  - Taiwan > Taiwan Province
    - Taipei (0.04)
  - Japan > Honshū
    - Kansai > Osaka Prefecture > Osaka (0.04)
  - Indonesia > Java
    - Jakarta > Jakarta (0.04)

Genre:
- Research Report (0.64)

Industry:
- Leisure & Entertainment > Sports (1.00)
- Media (0.93)
- Government (0.68)
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Therapeutic Area
    - Neurology (1.00)
    - Infections and Infectious Diseases (0.68)
    - Cardiology/Vascular Diseases (0.68)

Technology:
- Information Technology > Artificial Intelligence > Natural Language
  - Text Processing (1.00)
  - Information Retrieval (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found