CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding

Aug-7-2024–arXiv.org Artificial Intelligence

We present CREST (Compact Retrieval-Based Speculative Decoding), a redesign of REST that allows it to be effectively "compacted". REST is a drafting technique for speculative decoding based on retrieving exact n-gram matches of the most recent n tokens generated by the target LLM from a datastore. The key idea of CREST is to only store a subset of the smallest and most common n-grams in the datastore with the hope of achieving comparable performance with less storage space. We found that storing a subset of n-grams both reduces storage space and improves performance. CREST matches REST's accepted token length with 10.6-13.5x Although REST can achieve a high draft token acceptance rate, the static nature of the datastore introduces a new challenge Recently, Speculative Decoding has gained traction for accelerating regarding storage space.

crest, dataset, datastore, (13 more...)

arXiv.org Artificial Intelligence

Aug-7-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.71)
  - Machine Learning > Neural Networks (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found