Foundations of Large Language Models
–arXiv.org Artificial Intelligence
The development of neural sequence models, such as Transformers [Vaswani et al., 2017], along with the improvements in large-scale self-supervised learning, has opened the door to universal language understanding and generation. This achievement is largely motivated by pre-training: we separate common components from many neural network-based systems, and then train them on huge amounts of unlabeled data using self-supervision. These pre-trained models serve as foundation models that can be easily adapted to different tasks via fine-tuning or prompting. As a result, the paradigm of NLP has been enormously changed. In many cases, large-scale supervised learning for specific tasks is no longer required, and instead, we only need to adapt pre-trained foundation models.
arXiv.org Artificial Intelligence
Jan-15-2025
- Country:
- Asia
- China (0.04)
- Middle East
- Europe
- Germany > Berlin (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States
- California
- Los Angeles County > Los Angeles (0.04)
- Santa Clara County > Palo Alto (0.04)
- New York (0.04)
- Utah > Salt Lake County
- Salt Lake City (0.04)
- California
- South America
- Argentina (0.04)
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Asia
- Genre:
- Overview (1.00)
- Research Report
- New Finding (0.92)
- Promising Solution (1.00)
- Workflow (1.00)
- Industry:
- Banking & Finance (0.67)
- Education (1.00)
- Energy (1.00)
- Health & Medicine
- Consumer Health (0.67)
- Therapeutic Area (1.00)
- Information Technology > Security & Privacy (1.00)
- Law (1.00)
- Leisure & Entertainment > Sports (1.00)
- Transportation > Ground
- Road (0.92)
- Technology:
- Information Technology > Artificial Intelligence
- Cognitive Science > Problem Solving (1.00)
- Machine Learning
- Learning Graphical Models > Directed Networks
- Bayesian Learning (0.45)
- Neural Networks > Deep Learning (1.00)
- Reinforcement Learning (1.00)
- Statistical Learning (1.00)
- Learning Graphical Models > Directed Networks
- Natural Language
- Chatbot (1.00)
- Grammars & Parsing (0.92)
- Information Retrieval (1.00)
- Large Language Model (1.00)
- Machine Translation (1.00)
- Text Processing (0.92)
- Representation & Reasoning
- Optimization (1.00)
- Uncertainty > Bayesian Inference (0.67)
- Information Technology > Artificial Intelligence