PLDR-LLM: Large Language Model from Power Law Decoder Representations
–arXiv.org Artificial Intelligence
We present the Large Language Model from Power Law Decoder Representations (PLDR-LLM), a language model that leverages non-linear and linear transformations through Power Law Graph Attention mechanism to generate well-defined deductive and inductive outputs. We pretrain the PLDR-LLMs of varying layer sizes with a small batch size of 32 and $\sim$8B tokens from the RefinedWeb dataset, and show that they achieve competitive performance in zero-shot and few-shot settings compared to scaled dot-product LLMs of similar model size reported in the literature. We show that deductive outputs of PLDR-LLMs can be used to compare model characteristics or improve the performance by introducing the Directed Acyclic Graph (DAG) loss as a metric and regularizer. Our results indicate that the initial maximum learning rate and warm-up steps have a lasting impact on deductive outputs throughout the pretraining. We provide a detailed description of PLDR-LLM architecture, its implementation and the pretraining procedure.
arXiv.org Artificial Intelligence
Oct-22-2024
- Country:
- Oceania > Australia
- North America > United States
- Oregon > Multnomah County
- Portland (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Oregon > Multnomah County
- Europe
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland > Leinster
- Asia
- Middle East > Jordan (0.04)
- China > Hong Kong (0.04)
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Technology: