Parameter-Efficient Transformer Embeddings
–arXiv.org Artificial Intelligence
Embedding layers in transformer-based NLP models typicall y account for the largest share of model parameters, scaling with vocabulary size but not yielding performance gains proportional to scale. W e propose an alte rnative approach in which token embedding vectors are first generated determini stically, directly from the token IDs using a Fourier expansion of their normalized v alues, followed by a lightweight multilayer perceptron (MLP) that captures hig her-order interactions. W e train standard transformers and our architecture on natu ral language inference tasks (SNLI and MNLI), and evaluate zero-shot performance o n sentence textual similarity (STS-B). Our results demonstrate that the propo sed method achieves competitive performance using significantly fewer paramet ers, trains faster, and operates effectively without the need for dropout. This pro of-of-concept study highlights the potential for scalable, memory-efficient la nguage models and motivates further large-scale experimentation based on our find ings. The code for reproducing and pre-trained weights are available at https://github.com/HMUNACHI/pete .
arXiv.org Artificial Intelligence
May-6-2025