Large Language Model Inference with Lexical Shortlisting

Bogoychev, Nikolay, Chen, Pinzhen, Haddow, Barry, Birch, Alexandra

Nov-16-2023–arXiv.org Artificial Intelligence

Large language model (LLM) inference is computation and memory intensive, so we adapt lexical shortlisting to it hoping to improve both. While lexical shortlisting is well-explored in tasks like machine translation, it requires modifications before being suitable for LLMs as the intended applications vary significantly. Our work studies two heuristics to shortlist sub-vocabulary at LLM inference time: Unicode-based script filtering and corpus-based selection. We explore different LLM families and sizes, and we find that lexical shortlisting can reduce the memory usage of some models by nearly 50\% and has an upper bound of 25\% improvement in generation speed. In this pilot study, we also identify the drawbacks of such vocabulary selection methods and propose avenues for future research.

computational linguistic, language model, translation, (14 more...)

arXiv.org Artificial Intelligence

Nov-16-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States > Maryland
    - Baltimore (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe > Czechia
  - Prague (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)