Reviews: Fast Structured Decoding for Sequence Models

Neural Information Processing Systems 

The paper proposes to boost translation quality of a non-autoregressive (NART) neural machine translation system through a conditional random field (CRF) that is attached to the decoder. The CRF reduces the translation quality drop compared to autoregressive neural translation systems by imposing a bigram-language model like structure onto the decoder that helps to alleviate the strong independence assumption that NART architectures entail. The CRF is jointly trained with all other parameters of the neural network. Experiments conducted on WMT14 and IWSLT14 En-De and De-En tasks are reported to yield improvements of more than 6 BLEU points over their corresponding baselines. By augmenting the decoder with a Markov-order 1 CRF, the resulting network is strictly speaking no longer a non-autoregressive system.