Reviews: Decoding with Value Networks for Neural Machine Translation

Neural Information Processing Systems 

This paper addresses one of the limitation of NMT, the so-called exposure bias, that results from the fact that each word is chosen greedily. For this, the authors build on standard technique of reinforcement learning and try to predict, for each outgoing transition of a given state, the expected reward that will be achieved if the system take this transition. The article is overall very clear and the proposed ideas quite appealing, even if many of the decisions seem quite ad hoc (e.g. More importantly, several implementation "details" are not specified. For instance, in Equation (6), the BLEU function is defined at the sentence level while in the actual BLEU metric is defined at the corpus level.