Vertical LoRA: Dense Expectation-Maximization Interpretation of Transformers

Jun-13-2024–arXiv.org Artificial Intelligence

In recent years, the field of machine learning, especially natural language processing (NLP), has witnessed a transformative evolution, primarily catalyzed by the advent of Transformer models and large language models. These models are known for their emergent ability to comprehend and generate human-like text. Specifically, Transformer models seem to undergo a transformative evolution with the growth of parameter count, achieving unprecedented performance across a spectrum of tasks, including text generation, machine translation, text summarization, question answering, and visual understanding. This finding leads to the trends in scaling up models up to millions and even Billions of parameters, exemplified by OpenAI's GPT[1, 2], Google's BERT[3], Meta's Llama[4], and Anthropic's Claude[5]. However, this scaling in model size simultaneously has rendered a significant barrier for ordinary individuals to train these models on consumer hardware setups.

algorithm, arxiv preprint arxiv, transformer model, (13 more...)

arXiv.org Artificial Intelligence

Jun-13-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Romania
  - Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- North America > United States
  - New York (0.04)

Genre:
- Research Report (0.65)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found