InsNet: An Efficient, Flexible, and Performant Insertion-based Text Generation Model

Oct-15-2022–arXiv.org Artificial Intelligence

Insertion-based text generation that formulates the generation process as a sequence of token insertion operations has received increasing attention in recent years. There are two major advantages of insertion-based generation over the prevalent left-to-right auto-regressive paradigm: 1) It reduces the decoding cost to sub-linear w.r.t. the sequence length with parallel decoding (Stern et al., 2019; Gu et al., 2019b), and 2) the flexible insertion orders may better recover/utilize the underlying linguistic structures of languages (Welleck et al., 2019; Gu et al., 2019a). However, this new paradigm of text generation brings unique challenges, mostly in the training efficiency. Unlike left-to-right auto-regressive decoders which monotonically expand the context, the insertion operations complicate the position information of each token as the context expands. Concretely, as is shown in Figure 1, the absolute position of a token in a sequence constantly changes along with the insertion operations. As a result, a naive implementation of insertion-based models (e.g., Stern et al. (2019); Gu et al. (2019b)) needs to re-encode the context with updated positional information for each token as the insertions proceed, yielding inefficient training with O(n) times of context re-encoding (with n indicating the sequence length). To tackle this problem, previous insertion-based generation models such as Insertion Transformer (InsT) (Stern et al., 2019) and Levenshtein Transformer (LevT) (Gu et al., 2019b) propose parallel token insertion to reduce the insertion/re-encoding steps from O(n) to Θ(log n) for both training and inference. However, while it works well for machine translation, such parallel insertion falls short on high-entropy generation tasks such as open-domain dialogue systems(Li et al., 2017a), creative

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-15-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Los Angeles County > Los Angeles (0.14)
- Europe
  - Italy > Tuscany
    - Florence (0.04)
  - France > Hauts-de-France
    - Nord > Lille (0.04)

Genre:
- Workflow (0.72)
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found