Goto

Collaborating Authors

 levenshtein transformer



Levenshtein Transformer

Neural Information Processing Systems

Modern neural sequence generation models are built to either generate tokens step-by-step from scratch or (iteratively) modify a sequence of tokens bounded by a fixed length. In this work, we develop Levenshtein Transformer, a new partially autoregressive model devised for more flexible and amenable sequence generation. Unlike previous approaches, the basic operations of our model are insertion and deletion.



Reviews: Levenshtein Transformer

Neural Information Processing Systems

Detailed Comments "two atomic operations -- insertion and deletion" This is somewhat debatable. Under the LevT, an insertion operation first requires the number of slots to be predicted first, then the actual insertions are predicted. This is not completely atomic. Section 1. "(up to 5 speed-up" Figure 4. Section 4.1 "Analysis of Efficiency" This reviewer thinks the paper is quite misleading in the speed comparison, and iteration comparison. Figure 4 add/subtracts a U[0, 0.5) noise to the figure, which means it can subtract iterations -- this gives a misleading plot.


Levenshtein Transformer

Neural Information Processing Systems

Modern neural sequence generation models are built to either generate tokens step-by-step from scratch or (iteratively) modify a sequence of tokens bounded by a fixed length. In this work, we develop Levenshtein Transformer, a new partially autoregressive model devised for more flexible and amenable sequence generation. Unlike previous approaches, the basic operations of our model are insertion and deletion. We also propose a set of new training techniques dedicated at them, effectively exploiting one as the other's learning signal thanks to their complementary nature. Experiments applying the proposed model achieve comparable or even better performance with much-improved efficiency on both generation (e.g. machine translation, text summarization) and refinement tasks (e.g.


Review for NeurIPS paper: Cascaded Text Generation with Markov Transformers

Neural Information Processing Systems

Weaknesses: While I am advocating for this paper's acceptance, I'm curious as to whether the authors think this will truly be the dominant approach going forward in this area. I find this approach theoretically more appealing than the Levenshtein transformer, but I think the "global communication" as a negative feature of that model isn't strictly a negative. Sure, the more local nature of this one gives a speedup. But successfully capturing long-range dependencies is one of the things transformer models like GPT-3 seem to be good at. This is a limitation of the paper only evaluating on MT; in MT, the input heavily constrains the shape of the output and long-range output dependencies may not be quite as necessary.


Levenshtein Transformer

Gu, Jiatao, Wang, Changhan, Zhao, Junbo

Neural Information Processing Systems

Modern neural sequence generation models are built to either generate tokens step-by-step from scratch or (iteratively) modify a sequence of tokens bounded by a fixed length. In this work, we develop Levenshtein Transformer, a new partially autoregressive model devised for more flexible and amenable sequence generation. Unlike previous approaches, the basic operations of our model are insertion and deletion. We also propose a set of new training techniques dedicated at them, effectively exploiting one as the other's learning signal thanks to their complementary nature. Experiments applying the proposed model achieve comparable or even better performance with much-improved efficiency on both generation (e.g. machine translation, text summarization) and refinement tasks (e.g.