Synthetic Text Generation using Hypergraph Representations

Raman, Natraj, Shah, Sameena

arXiv.org Artificial Intelligence 

Synthetic text plays a vital role in data augmentation, model robustness, privacy preservation and scenario analysis. It is usually formulated as conditional text generation where a given source document is transformed using substitutions, paraphrasing, back translation, mixups etc. [1] to obtain a modified document. We argue that conditioning on the unstructured text limits the ability to mix text fragments coherently and produces transformations that are not confined to essential information, a critical necessity for long-form text. Furthermore, explaining the generated text becomes challenging, particularly detecting hallucinations [2]. We propose here a decompose and expand technique to generate synthetic text, where the semantic frames [3] of a source document are first extracted, and this compact interim form is used to generate the transformed text.