Vertical LoRA: Dense Expectation-Maximization Interpretation of Transformers
–arXiv.org Artificial Intelligence
In recent years, the field of machine learning, especially natural language processing (NLP), has witnessed a transformative evolution, primarily catalyzed by the advent of Transformer models and large language models. These models are known for their emergent ability to comprehend and generate human-like text. Specifically, Transformer models seem to undergo a transformative evolution with the growth of parameter count, achieving unprecedented performance across a spectrum of tasks, including text generation, machine translation, text summarization, question answering, and visual understanding. This finding leads to the trends in scaling up models up to millions and even Billions of parameters, exemplified by OpenAI's GPT[1, 2], Google's BERT[3], Meta's Llama[4], and Anthropic's Claude[5]. However, this scaling in model size simultaneously has rendered a significant barrier for ordinary individuals to train these models on consumer hardware setups.
arXiv.org Artificial Intelligence
Jun-13-2024
- Country:
- Europe > Romania
- North America > United States
- New York (0.04)
- Genre:
- Research Report (0.65)
- Technology: