Token Distillation: Attention-aware Input Embeddings For New Tokens
Dobler, Konstantin, Elliott, Desmond, de Melo, Gerard
–arXiv.org Artificial Intelligence
New tokens can be added to solve this problem, when coupled with a good initialization for their new embeddings. This excessive tokenization not only leads to reduced performance on downstream tasks (Rust et al., 2021; Ali et al., 2024) but also increases the computational Although adding new tokens to a model's vocabulary can reduce over-tokenization, it Whenever we wish to add a new token to a pretrained model's vocabulary, this new token may The semantics of a word composed of multiple subtokens will largely not be stored in their raw input embeddings at all - but rather constructed by the Transformer's attention/feed-forward layer stack during contextualization (Elhage et al., 2022; Lad et al., 2024; We demonstrate the efficacy of our method, dubbed "Token Distillation", in Section 5. We illustrate Our experimental setup is detailed in Section 4. In summary, our contributions are as follows. We motivate our proposed method by describing the fundamental limitations of current embedding initialization methods and empirically verify our claims. Most state-of-the-art Large Language Models (LLMs) are trained using a static tokenizer, usually derived by a byte-pair encoding scheme before model training (Sennrich et al., 2016). Furthermore, Lesci et al. (2025) show that in practice, words which are not a single A solution to this problem is to modify the existing vocabulary to suit the specific needs.
arXiv.org Artificial Intelligence
Nov-3-2025
- Country:
- Africa > Middle East
- Egypt > Giza Governorate > Giza (0.04)
- Asia
- China > Hong Kong (0.04)
- Indonesia > Bali (0.04)
- Middle East
- Jordan (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.14)
- Singapore (0.04)
- South Korea (0.04)
- Europe
- Austria > Vienna (0.14)
- Denmark > Capital Region
- Copenhagen (0.04)
- Germany > Brandenburg
- Potsdam (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- Dominican Republic (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California > Santa Clara County
- Palo Alto (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Oklahoma > Payne County
- Cushing (0.04)
- Washington > King County
- Seattle (0.04)
- California > Santa Clara County
- Canada > Ontario
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Africa > Middle East
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Education (0.93)
- Health & Medicine
- Pharmaceuticals & Biotechnology (0.67)
- Therapeutic Area (1.00)
- Technology: