Foundations of Large Language Models

Jan-15-2025–arXiv.org Artificial Intelligence

The development of neural sequence models, such as Transformers [Vaswani et al., 2017], along with the improvements in large-scale self-supervised learning, has opened the door to universal language understanding and generation. This achievement is largely motivated by pre-training: we separate common components from many neural network-based systems, and then train them on huge amounts of unlabeled data using self-supervision. These pre-trained models serve as foundation models that can be easily adapted to different tasks via fine-tuning or prompting. As a result, the paradigm of NLP has been enormously changed. In many cases, large-scale supervised learning for specific tasks is no longer required, and instead, we only need to adapt pre-trained foundation models.

large language model, machine learning, reinforcement learning, (30 more...)

arXiv.org Artificial Intelligence

Jan-15-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (0.92)

Genre:
- Workflow (1.00)
- Overview (1.00)
- Research Report
  - Promising Solution (1.00)
  - New Finding (0.92)

Industry:
- Leisure & Entertainment > Sports (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Energy (1.00)
- Education (1.00)
- Banking & Finance (0.67)
- Transportation > Ground
  - Road (0.92)
- Health & Medicine
  - Therapeutic Area (1.00)
  - Consumer Health (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science > Problem Solving (1.00)
  - Representation & Reasoning
    - Optimization (1.00)
    - Uncertainty > Bayesian Inference (0.67)
  - Natural Language
    - Machine Translation (1.00)
    - Large Language Model (1.00)
    - Information Retrieval (1.00)
    - Chatbot (1.00)
    - Text Processing (0.92)
    - Grammars & Parsing (0.92)
  - Machine Learning
    - Statistical Learning (1.00)
    - Reinforcement Learning (1.00)
    - Neural Networks > Deep Learning (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.45)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found