Simple Questions Generate Named Entity Recognition Datasets
Kim, Hyunjae, Yoo, Jaehyo, Yoon, Seunghyun, Lee, Jinhyuk, Kang, Jaewoo
–arXiv.org Artificial Intelligence
Recent named entity recognition (NER) models often rely on human-annotated datasets, requiring the significant engagement of professional knowledge on the target domain and entities. This research introduces an ask-to-generate approach that automatically generates NER datasets by asking questions in simple natural language to an open-domain question answering system (e.g., "Which disease?"). Despite using fewer in-domain resources, our models, solely trained on the generated datasets, largely outperform strong low-resource models by an average F1 score of 19.4 for six popular NER benchmarks. Furthermore, our models provide competitive performance with rich-resource models that additionally leverage in-domain dictionaries provided by domain experts. In few-shot NER, we outperform the previous best model by an F1 score of 5.2 on three benchmarks and achieve new state-of-the-art performance.
arXiv.org Artificial Intelligence
Nov-5-2022
- Country:
- Africa > South Africa (0.04)
- North America
- Dominican Republic (0.04)
- United States
- New York (0.04)
- Florida > Orange County (0.04)
- Arizona (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Illinois > Cook County
- Chicago (0.04)
- California > San Diego County
- San Diego (0.04)
- Canada
- Quebec > Montreal (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Europe
- Italy (0.04)
- United Kingdom > England (0.04)
- Sweden (0.04)
- Russia (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment > Sports (1.00)
- Media (0.93)
- Government (0.68)
- Health & Medicine
- Technology: