Neural Data-to-Text Generation Based on Small Datasets: Comparing the Added Value of Two Semi-Supervised Learning Approaches on Top of a Large Language Model
van der Lee, Chris, Ferreira, Thiago Castro, Emmery, Chris, Wiltshire, Travis, Krahmer, Emiel
–arXiv.org Artificial Intelligence
This study discusses the effect of semi-supervised learning in combination with pretrained language models for data-to-text generation. It is not known whether semi-supervised learning is still helpful when a large-scale language model is also supplemented. This study aims to answer this question by comparing a data-to-text system only supplemented with a language model, to two data-to-text systems that are additionally enriched by a data augmentation or a pseudo-labeling semi-supervised learning approach. Results show that semi-supervised learning results in higher scores on diversity metrics. In terms of output quality, extending the training set of a data-to-text system with a language model using the pseudo-labeling approach did increase text quality scores, but the data augmentation approach yielded similar scores to the system without training set extension. These results indicate that semi-supervised learning approaches can bolster output quality and diversity, even when a language model is also present.
arXiv.org Artificial Intelligence
Jul-14-2022
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia
- China > Hong Kong (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Middle East > Republic of Türkiye
- Ankara Province > Ankara (0.04)
- Europe
- Germany > Saarland
- Saarbrücken (0.04)
- Czechia > Prague (0.04)
- United Kingdom > Scotland
- City of Aberdeen > Aberdeen (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Netherlands
- Gelderland > Arnhem (0.04)
- North Holland > Amsterdam (0.04)
- South Holland > The Hague (0.04)
- Spain
- Italy > Tuscany
- Florence (0.04)
- Germany > Saarland
- North America
- Canada > British Columbia
- Dominican Republic (0.04)
- United States
- California > Los Angeles County
- Long Beach (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- New York > New York County
- New York City (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- California > Los Angeles County
- South America
- Brazil > Minas Gerais (0.04)
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Africa > Ethiopia
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Government > Regional Government
- Leisure & Entertainment (0.67)
- Media > News (0.67)
- Technology: