ec24a54d62ce57ba93a531b460fa8d18-AuthorFeedback.pdf
–Neural Information Processing Systems
We experimented on using Sigmoid functions, and it does not work. To Reviewer #3: Our algorithm scales linearly w.r.t. the input size, which is not larger than any other algorithms.12 Thiscanberealized42 by a max operation, which is differentiable. Note that we don't need to use argmax operation to find the indexr.43 Instead, every decoded token is involved in the later computation.
Neural Information Processing Systems
Feb-10-2026, 23:47:01 GMT
- Technology: