Unsupervised Domain Adaptation for Keyphrase Generation using Citation Contexts
Boudin, Florian, Aizawa, Akiko
–arXiv.org Artificial Intelligence
Adapting keyphrase generation models to new domains typically involves few-shot fine-tuning with in-domain labeled data. However, annotating documents with keyphrases is often prohibitively expensive and impractical, requiring expert annotators. This paper presents silk, an unsupervised method designed to address this issue by extracting silver-standard keyphrases from citation contexts to create synthetic labeled data for domain adaptation. Extensive experiments across three distinct domains demonstrate that our method yields high-quality synthetic samples, resulting in significant and consistent improvements in in-domain performance over strong baselines.
arXiv.org Artificial Intelligence
Oct-1-2024
- Country:
- Asia
- China > Heilongjiang Province
- Daqing (0.04)
- Japan > Honshū
- Kansai > Osaka Prefecture
- Osaka (0.04)
- Kantō > Tokyo Metropolis Prefecture
- Tokyo (0.14)
- Kansai > Osaka Prefecture
- Middle East
- Singapore (0.04)
- China > Heilongjiang Province
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Czechia > Prague (0.04)
- France > Pays de la Loire
- Loire-Atlantique > Nantes (0.04)
- Italy > Tuscany
- Florence (0.04)
- Netherlands (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- Switzerland (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Dominican Republic (0.04)
- United States
- District of Columbia > Washington (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- New York > New York County
- New York City (0.14)
- Ohio > Franklin County
- Columbus (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Washington > King County
- Seattle (0.04)
- Canada
- Asia
- Genre:
- Research Report > New Finding (0.93)
- Technology: