Comparison with the vanilla SGD baseline

Neural Information Processing Systems 

We thank the reviewers for their comments. We will carefully modify the paper according to the suggestions.Figure 1: Comparison of different learning schemes on RotMNIST classification and IWSL T translation tasks. For the NMT tasks, we used the same parameter settings from previous papers, as described in section 5.2. Assistant model shows similar performance over different batch sizes. However, we will provide results on raw ImageNet dataset and large Transformer model in the revised version.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found