Foundations of Large Language Models
–arXiv.org Artificial Intelligence
The development of neural sequence models, such as Transformers [Vaswani et al., 2017], along with the improvements in large-scale self-supervised learning, has opened the door to universal language understanding and generation. This achievement is largely motivated by pre-training: we separate common components from many neural network-based systems, and then train them on huge amounts of unlabeled data using self-supervision. These pre-trained models serve as foundation models that can be easily adapted to different tasks via fine-tuning or prompting. As a result, the paradigm of NLP has been enormously changed. In many cases, large-scale supervised learning for specific tasks is no longer required, and instead, we only need to adapt pre-trained foundation models.
arXiv.org Artificial Intelligence
Jan-15-2025
- Country:
- Europe (1.00)
- North America > United States (0.92)
- Genre:
- Workflow (1.00)
- Overview (1.00)
- Research Report
- Promising Solution (1.00)
- New Finding (0.92)
- Industry:
- Leisure & Entertainment > Sports (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Energy (1.00)
- Education (1.00)
- Banking & Finance (0.67)
- Transportation > Ground
- Road (0.92)
- Health & Medicine
- Therapeutic Area (1.00)
- Consumer Health (0.67)
- Technology:
- Information Technology > Artificial Intelligence
- Cognitive Science > Problem Solving (1.00)
- Representation & Reasoning
- Optimization (1.00)
- Uncertainty > Bayesian Inference (0.67)
- Natural Language
- Machine Translation (1.00)
- Large Language Model (1.00)
- Information Retrieval (1.00)
- Chatbot (1.00)
- Text Processing (0.92)
- Grammars & Parsing (0.92)
- Machine Learning
- Statistical Learning (1.00)
- Reinforcement Learning (1.00)
- Neural Networks > Deep Learning (1.00)
- Learning Graphical Models > Directed Networks
- Bayesian Learning (0.45)
- Information Technology > Artificial Intelligence