LLM Post-Training: A Deep Dive into Reasoning Large Language Models

Kumar, Komal, Ashraf, Tajamul, Thawakar, Omkar, Anwer, Rao Muhammad, Cholakkal, Hisham, Shah, Mubarak, Yang, Ming-Hsuan, Torr, Phillip H. S., Khan, Salman, Khan, Fahad Shahbaz

Feb-28-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Pretraining on vast web-scale data has laid the foundation for these models, yet the research community is now increasingly shifting focus toward post-training techniques to achieve further breakthroughs. While pretraining provides a broad linguistic foundation, post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations. Fine-tuning, reinforcement learning, and test-time scaling have emerged as critical strategies for optimizing LLMs performance, ensuring robustness, and improving adaptability across various real-world tasks. This survey provides a systematic exploration of post-training methodologies, analyzing their role in refining LLMs beyond pretraining, addressing key challenges such as catastrophic forgetting, reward hacking, and inference-time trade-offs. We highlight emerging directions in model alignment, scalable adaptation, and inference-time reasoning, and outline future research directions. We also provide a public repository to continually track developments in this fast-evolving field: https://github.com/mbzuai-oryx/Awesome-LLM-Post-training.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Feb-28-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe (1.00)
- North America > United States
  - California (0.27)
  - Florida > Orange County
    - Orlando (0.14)

Genre:
- Overview (1.00)
- Research Report (1.00)
- Workflow (0.92)

Industry:
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.92)
- Law (0.67)
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)