Thanks all the reviewers for the comments and suggestions! 1 To Reviewer # 1

Neural Information Processing Systems 

Therefore, we choose fewer attention heads. Many previous works (e.g., [1][2][3]) use this batch size to evaluate the inference latency. There is a typo in Table 2. The unit of latency should be "second". We will fix it in the new version of the paper.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found