PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-training

Yi, Rongjie, Li, Xiang, Xie, Weikai, Lu, Zhenyan, Wang, Chenghua, Zhou, Ao, Wang, Shangguang, Zhang, Xiwen, Xu, Mengwei

Nov-6-2024–arXiv.org Artificial Intelligence

The interest in developing small language models (SLM) for on-device deployment is fast growing. However, the existing SLM design hardly considers the device hardware characteristics. Instead, this work presents a simple yet effective principle for SLM design: architecture searching for (near-)optimal runtime efficiency before pre-training. Guided by this principle, we develop PhoneLM SLM family (currently with 0.5B and 1.5B versions), that acheive the state-of-the-art capability-efficiency tradeoff among those with similar parameter size. We fully open-source the code, weights, and training datasets of PhoneLM for reproducibility and transparency, including both base and instructed versions. We also release a finetuned version of PhoneLM capable of accurate Android Intent invocation, and an end-to-end Android demo. All materials are available at https://github.com/UbiquitousLearning/PhoneLM.

large language model, machine learning, phonelm, (17 more...)

arXiv.org Artificial Intelligence

Nov-6-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.14)

Genre:
- Research Report (0.50)

Industry:
- Energy (0.93)
- Health & Medicine > Therapeutic Area
  - Psychiatry/Psychology (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.94)
    - Natural Language
      - Chatbot (0.69)
      - Large Language Model (1.00)
  - Communications > Mobile (1.00)