c5c1bda1194f9423d744e0ef67df94ee-AuthorFeedback.pdf

Neural Information Processing Systems 

We will add more details in the7 updatedversion.8 To Reviewer #1 Yes, we agree that the improvements compared with state-of-the-art models are marginal. But9 the main goal of this paper is toreduce the memory cost of vanilla self-attention while achieving slightly better10 performances. Fortheheadadaptive30 strategy, we reported results for both (head adaptive or not adaptive) for different tasks. We will show these in the next version.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found