Speculative Decoding with Big Little Decoder

Neural Information Processing Systems 

Figure 1: Illustration of (Left) the normal autoregressive decoding procedure of a large model and (Right) BiLD that consists of a small model and a large model. In BiLD, the small model generates tokens autoregressively (i.e., sequentially) until it hands over control to the large model.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found