Notes on the Mathematical Structure of GPT LLM Architectures

Oct-25-2024–arXiv.org Artificial Intelligence

Introduction When considered from a purely mathematical point of view, the building and training of a large (transformer) language model (LLM) is the construction of a function - which can be taken to be a map from some euclidean space to another - that has certain interesting properties. And therefore, from the point of view of a mathematician, it may be frustrating to find that many key papers announcing significant new LLMs seem reluctant to simply spell out the details of the function that they have constructed in plain mathematical language or indeed even in complete pseudo-code (and the latter form of this complaint appears to be one of the motivations behind a recent article of Phuong and Hutter [1]). Here, we seek to give a relatively'pure' mathematical description of the architecture of a GPT-3-style LLM. There is then a separate process - the training of the model - in which a particular value θ Θ is selected using a training algorithm. We will draw attention to such parameters as we introduce them, as opposed to attempting to give a definition of Θ up front.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Oct-25-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found