Xmodel-2 Technical Report

Qun, Wang, Yang, Liu, Qingquan, Lin, Zhijiu, Qu, Ling, Jiang

Dec-27-2024–arXiv.org Artificial Intelligence

Xmodel-2 is a 1.2-billion-parameter large language model designed specifically for reasoning tasks. Its architecture enables different model scales to share a unified set of hyperparameters, allowing for extensive experimentation on smaller models and seamless transfer of optimal configurations to larger models. To maximize training efficiency and stability, Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on 1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Dec-27-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning (1.00)
  - Natural Language > Large Language Model (0.90)