c5c1bda1194f9423d744e0ef67df94ee-AuthorFeedback.pdf
–Neural Information Processing Systems
We will add more details in the7 updatedversion.8 To Reviewer #1 Yes, we agree that the improvements compared with state-of-the-art models are marginal. But9 the main goal of this paper is toreduce the memory cost of vanilla self-attention while achieving slightly better10 performances. Fortheheadadaptive30 strategy, we reported results for both (head adaptive or not adaptive) for different tasks. We will show these in the next version.
Neural Information Processing Systems
Feb-10-2026, 06:05:45 GMT