ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models
Heng, Yuzhao, Deng, Chunyuan, Li, Yitong, Yu, Yue, Li, Yinghao, Zhang, Rongzhi, Zhang, Chao
–arXiv.org Artificial Intelligence
Although Large Language Models (LLMs) exhibit remarkable adaptability across domains, these models often fall short in structured knowledge extraction tasks such as named entity recognition (NER). This paper explores an innovative, cost-efficient strategy to harness LLMs with modest NER capabilities for producing superior NER datasets. Our approach diverges from the basic class-conditional prompts by instructing LLMs to self-reflect on the specific domain, thereby generating domain-relevant attributes (such as category and emotions for movie reviews), which are utilized for creating attribute-rich training data. Furthermore, we preemptively generate entity terms and then develop NER context data around these entities, effectively bypassing the LLMs' challenges with complex structures. Our experiments across both general and niche domains reveal significant performance enhancements over conventional data generation methods while being more cost-effective than existing alternatives.
arXiv.org Artificial Intelligence
Jun-9-2024
- Country:
- Oceania > Australia (0.14)
- Antarctica (0.04)
- South America
- Brazil > Rio de Janeiro
- Rio de Janeiro (0.04)
- Argentina > Pampas
- Buenos Aires F.D. > Buenos Aires (0.04)
- Brazil > Rio de Janeiro
- Pacific Ocean > North Pacific Ocean
- San Francisco Bay > Golden Gate (0.04)
- North America
- The Bahamas (0.14)
- Dominican Republic (0.04)
- United States
- Indiana (0.04)
- Hawaii (0.04)
- District of Columbia > Washington (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- New York
- Queens County > New York City (0.04)
- New York County > New York City (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- California
- San Francisco County > San Francisco (0.04)
- Los Angeles County > Los Angeles (0.04)
- Canada
- Europe
- United Kingdom (0.14)
- Slovakia (0.04)
- Austria (0.04)
- Holy See > Vatican City (0.04)
- Czechia > Prague (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Spain > Galicia
- Madrid (0.04)
- Germany > Bavaria
- Upper Bavaria > Munich (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- France > Île-de-France
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Asia
- Russia (0.45)
- Singapore (0.04)
- Afghanistan (0.04)
- Indonesia > Bali (0.04)
- India > NCT
- New Delhi (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- South Korea > Seoul
- Seoul (0.04)
- China
- Middle East
- Saudi Arabia (0.04)
- Syria (0.04)
- UAE
- Dubai Emirate > Dubai (0.04)
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Ankara Province > Ankara (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Africa
- Rwanda (0.04)
- South Africa > Western Cape
- Cape Town (0.04)
- Middle East > Egypt
- Cairo Governorate > Cairo (0.04)
- Genre:
- Workflow (0.93)
- Research Report > New Finding (0.67)
- Personal
- Industry:
- Automobiles & Trucks > Manufacturer (1.00)
- Information Technology (1.00)
- Law (1.00)
- Banking & Finance > Economy (1.00)
- Education (1.00)
- Consumer Products & Services > Restaurants (1.00)
- Leisure & Entertainment > Sports (1.00)
- Government > Regional Government
- North America Government > United States Government (1.00)
- Europe Government (1.00)
- Asia Government (1.00)
- Health & Medicine > Therapeutic Area
- Immunology (0.92)
- Infections and Infectious Diseases (0.92)
- Media
- Technology: