Goto

Collaborating Authors

 Ling, Jiang


Xmodel-2 Technical Report

arXiv.org Artificial Intelligence

Xmodel-2 is a 1.2-billion-parameter large language model designed specifically for reasoning tasks. Its architecture enables different model scales to share a unified set of hyperparameters, allowing for extensive experimentation on smaller models and seamless transfer of optimal configurations to larger models. To maximize training efficiency and stability, Xmodel-2 employs the WSD learning rate scheduler from MiniCPM. Pretrained on 1.5 trillion tokens from diverse sources, Xmodel-2 achieves state-of-the-art performance in complex reasoning and agent-based tasks, while maintaining low training costs.


Xmodel-1.5: An 1B-scale Multilingual LLM

arXiv.org Artificial Intelligence

We introduce Xmodel-1.5, a 1-billion-parameter multilingual large language model pretrained on 2 trillion tokens, designed for balanced performance and scalability. Unlike most large models that use the BPE tokenizer, Xmodel-1.5 employs a custom unigram tokenizer with 65,280 tokens, optimizing both efficiency and accuracy. The model delivers competitive results across multiple languages, including Thai, Arabic, French, Chinese, and English, outperforming Alibaba's PolyLM-1.7B on respective evaluation datasets. Xmodel-1.5 excels in benchmarks like mMMLU and PIQA, and achieves state-of-the-art results in Thai. To support low-resource language research, we release Xdata_Thai, a Thai-specific evaluation dataset featuring unique linguistic challenges such as gendered particles and idioms. While the model demonstrates strong performance, there is still room for improvement in handling culturally specific nuances. We hope this work contributes to advancements in multilingual AI research. Models and code are publicly available on GitHub at https://github.com/XiaoduoAILab/XmodelLM-1.5