A Mathematical Details

Neural Information Processing Systems 

We provide additional experimental results to supplement Section 6 . In Section B.1, we include In Section B.2, we present a few example outputs with a visualization In Section B.3, we include results Figure B.1 and Figure B.2 present the empirical The bottom row presents the average number of decoder layer used. The bottom row presents the average number of decoder layer used. Figure B.5 presents two example outputs of CALM for instances from the machine translation, and We observe that the textual distance generally increases as we accelerate the decoding. Interestingly, following our initial intuition, CALM distributes the compute unevenly, using very few layers for certain "easy" tokens, and additional compute to "hard" tokens.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found