FLM-101B: An Open LLM and How to Train It with $100K Budget

Li, Xiang, Yao, Yiqun, Jiang, Xin, Fang, Xuezhi, Meng, Xuying, Fan, Siqi, Han, Peng, Li, Jing, Du, Li, Qin, Bowen, Zhang, Zheng, Sun, Aixin, Wang, Yequan

Sep-17-2023–arXiv.org Artificial Intelligence

Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks, among others. Despite these successes, two main challenges remain in developing LLMs: (i) high computational cost, and (ii) fair and objective evaluations. In this paper, we report a solution to significantly reduce LLM training cost through a growth strategy. We demonstrate that a 101B-parameter LLM with 0.31T tokens can be trained with a budget of 100K US dollars. Inspired by IQ tests, we also consolidate an additional range of evaluations on top of existing evaluations that focus on knowledge-oriented abilities. These IQ evaluations include symbolic mapping, rule understanding, pattern mining, and anti-interference. Such evaluations minimize the potential impact of memorization. Experimental results show that our model, named FLM-101B, trained with a budget of 100K US dollars, achieves performance comparable to powerful and well-known models, e.g., GPT-3 and GLM-130B, especially on the additional range of IQ evaluations. The checkpoint of FLM-101B is released at https://huggingface.co/CofeAI/FLM-101B.

evaluation, flm-101b, language model, (16 more...)

arXiv.org Artificial Intelligence

Sep-17-2023

arXiv.org PDF

Add feedback

Country:
- South America > Colombia
  - Meta Department > Villavicencio (0.04)
- North America
  - United States
    - Maryland > Baltimore (0.04)
    - Washington > King County
      - Seattle (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - Georgia > Fulton County
      - Atlanta (0.04)
  - Puerto Rico > San Juan
    - San Juan (0.04)
  - Canada
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Italy
    - Tuscany > Florence (0.04)
    - Calabria > Catanzaro Province
      - Catanzaro (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.04)
- Asia
  - Thailand (0.04)
  - Singapore (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
  - Japan > Honshū
    - Chūbu > Toyama Prefecture > Toyama (0.04)
  - China
    - Beijing > Beijing (0.04)
    - Sichuan Province > Chengdu (0.04)
    - Heilongjiang Province > Harbin (0.04)
    - Guangdong Province > Shenzhen (0.04)
- Africa > Rwanda
  - Kigali > Kigali (0.04)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Education (0.92)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.90)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found