Parameter-Efficient Transformer Embeddings

May-6-2025–arXiv.org Artificial Intelligence

Embedding layers in transformer-based NLP models typicall y account for the largest share of model parameters, scaling with vocabulary size but not yielding performance gains proportional to scale. W e propose an alte rnative approach in which token embedding vectors are first generated determini stically, directly from the token IDs using a Fourier expansion of their normalized v alues, followed by a lightweight multilayer perceptron (MLP) that captures hig her-order interactions. W e train standard transformers and our architecture on natu ral language inference tasks (SNLI and MNLI), and evaluate zero-shot performance o n sentence textual similarity (STS-B). Our results demonstrate that the propo sed method achieves competitive performance using significantly fewer paramet ers, trains faster, and operates effectively without the need for dropout. This pro of-of-concept study highlights the potential for scalable, memory-efficient la nguage models and motivates further large-scale experimentation based on our find ings. The code for reproducing and pre-trained weights are available at https://github.com/HMUNACHI/pete .

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

May-6-2025

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom (0.14)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.68)
  - Machine Learning > Neural Networks
    - Perceptrons (0.55)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found