Thanks all the reviewers for the comments and suggestions! 1 To Reviewer # 1
–Neural Information Processing Systems
Therefore, we choose fewer attention heads. Many previous works (e.g., [1][2][3]) use this batch size to evaluate the inference latency. There is a typo in Table 2. The unit of latency should be "second". We will fix it in the new version of the paper.
Neural Information Processing Systems
Sep-28-2025, 08:26:19 GMT
- Technology: