Towards smaller, faster decoder-only transformers: Architectural variants and their implications

Suresh, Sathya Krishnan, P, Shunmugapriya

arXiv.org Artificial Intelligence 

Since the debut of ChatGPT, there has been a notable increase in research on Large Language Models (LLMs) across a broad range of disciplines, made possible by the accessibility of this technology to a diverse user base. This fastly growing field has largely pursued two distinct paths: one aims at either scaling the model dimensions or the training dataset (or both) to enhance performance, while the other concentrates on refining smaller models (ranging from 1B to 7B parameters) with high-quality data. Despite these advances, investigations into the structural modifications of the transformer architecture itself have been relatively overlooked. Recent studies challenge the necessity of perpetually increasing model sizes by demonstrating that the deeper layers of LLMs may have minimal influence on predictive outcomes. In this work, we explore modifications to the decoder-only transformer architecture to address current challenges in the scalability and practical application of Large Language Models (LLMs).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found