YuLan-Mini: An Open Data-efficient Language Model

Hu, Yiwen, Song, Huatong, Deng, Jia, Wang, Jiapeng, Chen, Jie, Zhou, Kun, Zhu, Yutao, Jiang, Jinhao, Dong, Zican, Zhao, Wayne Xin, Wen, Ji-Rong

Dec-24-2024–arXiv.org Artificial Intelligence

Effective pre-training of large language models (LLMs) has been challenging due to the immense resource demands and the complexity of the technical processes involved. This paper presents a detailed technical report on YuLan-Mini, a highly capable base model with 2.42B parameters that achieves top-tier performance among models of similar parameter scale. Our pre-training approach focuses on enhancing training efficacy through three key technical contributions: an elaborate data pipeline combines data cleaning with data schedule strategies, a robust optimization method to mitigate training instability, and an effective annealing approach that incorporates targeted data selection and long context training. Remarkably, YuLan-Mini, trained on 1.08T tokens, achieves performance comparable to industry-leading models that require significantly more data. To facilitate reproduction, we release the full details of the data composition for each training phase.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Dec-24-2024

arXiv.org PDF

Add feedback

Country:
- Asia (1.00)
- Europe (1.00)
- North America > United States (1.00)

Genre:
- Instructional Material > Course Syllabus & Notes (0.46)
- Research Report > New Finding (0.67)

Industry:
- Education > Educational Setting > K-12 Education (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found