ChuXin: 1.6B Technical Report

Zhuang, Xiaomin, Jiang, Yufan, He, Qiaozhi, Wu, Zhihua

May-8-2024–arXiv.org Artificial Intelligence

Unlike the majority of works that only opensourced the model weights and architecture, we have made everything needed to train a model available, including the training data, the training process, and the evaluation code. Our goal is to empower and strengthen the open research community, fostering transparency and enabling a new wave of innovation in the field of language modeling. Furthermore, we extend the context length to 1M tokens through lightweight continual pretraining and demonstrate strong needlein-a-haystack retrieval performance. Countless models have been opensourced on AI communities like HuggingFace to facilitate their use by researchers (Bai et al., 2023; Singer et al., 2024; Zhang et al., 2024). These models can broadly be divided into two categories: 1) Open source model weights and data sources, which constitute the vast majority.

arxiv preprint arxiv, chuxin, language model, (14 more...)

arXiv.org Artificial Intelligence

May-8-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Commonsense Reasoning (0.69)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.89)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found