Lexinvariant Language Models

Dec-25-2025, 02:02:58 GMT–Neural Information Processing Systems

Token embeddings, a mapping from discrete lexical symbols to continuous vectors, are at the heart of any language model (LM). However, lexical symbol meanings can also be determined and even redefined by their structural role in a long context. In this paper, we ask: is it possible for a language model to be performant without \emph{any} fixed token embeddings? Such a language model would have to rely entirely on the co-occurence and repetition of tokens in the context rather than the \textit{a priori} identity of any token. To answer this, we study \textit{lexinvariant}language models that are invariant to lexical symbols and therefore do not need fixed token embeddings in practice.

language model, lexinvariant language model, name change, (10 more...)

Neural Information Processing Systems

Dec-25-2025, 02:02:58 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (1.00)