Vertical LoRA: Dense Expectation-Maximization Interpretation of Transformers

Fu, Zhuolin

arXiv.org Artificial Intelligence 

In recent years, the field of machine learning, especially natural language processing (NLP), has witnessed a transformative evolution, primarily catalyzed by the advent of Transformer models and large language models. These models are known for their emergent ability to comprehend and generate human-like text. Specifically, Transformer models seem to undergo a transformative evolution with the growth of parameter count, achieving unprecedented performance across a spectrum of tasks, including text generation, machine translation, text summarization, question answering, and visual understanding. This finding leads to the trends in scaling up models up to millions and even Billions of parameters, exemplified by OpenAI's GPT[1, 2], Google's BERT[3], Meta's Llama[4], and Anthropic's Claude[5]. However, this scaling in model size simultaneously has rendered a significant barrier for ordinary individuals to train these models on consumer hardware setups.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found