Towards smaller, faster decoder-only transformers: Architectural variants and their implications

Suresh, Sathya Krishnan, P, Shunmugapriya

Apr-23-2024–arXiv.org Artificial Intelligence

Since the debut of ChatGPT, there has been a notable increase in research on Large Language Models (LLMs) across a broad range of disciplines, made possible by the accessibility of this technology to a diverse user base. This fastly growing field has largely pursued two distinct paths: one aims at either scaling the model dimensions or the training dataset (or both) to enhance performance, while the other concentrates on refining smaller models (ranging from 1B to 7B parameters) with high-quality data. Despite these advances, investigations into the structural modifications of the transformer architecture itself have been relatively overlooked. Recent studies challenge the necessity of perpetually increasing model sizes by demonstrating that the deeper layers of LLMs may have minimal influence on predictive outcomes. In this work, we explore modifications to the decoder-only transformer architecture to address current challenges in the scalability and practical application of Large Language Models (LLMs).

architecture, decoder block, dimension, (14 more...)

arXiv.org Artificial Intelligence

Apr-23-2024

arXiv.org PDF

Add feedback

Country:
- Asia > India > Puducherry (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found