Self-Compositional Data Augmentation for Scientific Keyphrase Generation
Houbre, Mael, Boudin, Florian, Daille, Beatrice, Aizawa, Akiko
–arXiv.org Artificial Intelligence
State-of-the-art models for keyphrase generation require large amounts of training data to achieve good performance. However, obtaining keyphrase-labeled documents can be challenging and costly. To address this issue, we present a self-compositional data augmentation method. More specifically, we measure the relatedness of training documents based on their shared keyphrases, and combine similar documents to generate synthetic samples. The advantage of our method lies in its ability to create additional training samples that keep domain coherence, without relying on external data or resources. Our results on multiple datasets spanning three different domains, demonstrate that our method consistently improves keyphrase generation. A qualitative analysis of the generated keyphrases for the Computer Science domain confirms this improvement towards their representativity property.
arXiv.org Artificial Intelligence
Nov-6-2024
- Country:
- Asia
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Czechia > Prague (0.04)
- France > Pays de la Loire
- Loire-Atlantique > Nantes (0.05)
- Germany > Berlin (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Tuscany
- Florence (0.04)
- Spain > Galicia
- Madrid (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Dominican Republic (0.04)
- United States
- California > Alameda County
- Berkeley (0.04)
- District of Columbia > Washington (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New York
- New York County > New York City (0.05)
- Niagara County > Niagara Falls (0.04)
- North Carolina > Orange County
- Chapel Hill (0.04)
- Washington > King County
- Seattle (0.14)
- California > Alameda County
- Canada
- Genre:
- Research Report > New Finding (0.88)
- Technology: