01a0683665f38d8e5e567b3b15ca98bf-AuthorFeedback.pdf

Neural Information Processing Systems 

R1 -Long-Term Dependencies: It is a good point that there is a tradeoff between better inference and longer term8 dependencies among outputs. We note that this method is a compromise in this regard between NAT and fully9 autoregressive. It is true that we lack long-term dependencies of very long-term models (GPT-3); however it is an10 open-question of whether all these dependencies are required for conditional generation. Lastly, we propose tree decoding to make the parallel23 complexitysublinear,whereasSunetal'sworkislinear.24 -Why max marginals and not ngram scores: The problem with using ngram scores is that they do not consider25 compatibility withotherpositions.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found