Probing the topology of the space of tokens with structured prompts

Robinson, Michael, Dey, Sourya, Kushner, Taisa

arXiv.org Artificial Intelligence 

The set of tokens T, when embedded within the latent space X of a large language model (LLM) can be thought of as a finite sample drawn from a distribution supported on a topological subspace of X. One can ask what the smallest (in the sense of inclusion) subspace and simplest (in terms of fewest free parameters) distribution can account for such a sample. Previous work[1] suggests that the smallest topological subspace from which tokens can be drawn is not manifold, but has structure consistent with a stratified manifold. That paper relied upon knowing the token input embedding function T X, which given each token t T, ascribes a representation in X. Because embeddings preserve topological structure, in this paper, we will study T by equating it with the image of the token input embedding function, thereby treating T both as the set of tokens and as a subspace of X. This subspace is called the token subspace of X. Usually X is taken to be Euclidean space R

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found