How Does Pretraining Improve Discourse-Aware Translation?