glm-130b
GLM-130B: An Open Bilingual Pre-trained Model
Zeng, Aohan, Liu, Xiao, Du, Zhengxiao, Wang, Zihan, Lai, Hanyu, Ding, Ming, Yang, Zhuoyi, Xu, Yifan, Zheng, Wendi, Xia, Xiao, Tam, Weng Lam, Ma, Zixuan, Xue, Yufei, Zhai, Jidong, Chen, Wenguang, Zhang, Peng, Dong, Yuxiao, Tang, Jie
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 (davinci) and unveil how models of such a scale can be successfully pre-trained. Over the course of this effort, we face numerous unexpected technical and engineering challenges, particularly on loss spikes and divergence. In this paper, we introduce the training process of GLM-130B including its design choices, training strategies for both efficiency and stability, and engineering efforts. The resultant GLM-130B model offers significant outperformance over GPT-3 175B (davinci) on a wide range of popular English benchmarks while the performance advantage is not observed in OPT-175B and BLOOM-176B. It also consistently and significantly outperforms ERNIE TITAN 3.0 260B -- the largest Chinese language model -- across related benchmarks. Finally, we leverage a unique scaling property of GLM-130B to reach INT4 quantization without post training, with almost no performance loss, making it the first among 100B-scale models and more importantly, allowing its effective inference on 4$\times$RTX 3090 (24G) or 8$\times$RTX 2080 Ti (11G) GPUs, the most affordable GPUs required for using 100B-scale models. The GLM-130B model weights are publicly accessible and its code, training logs, related toolkit, and lessons learned are open-sourced at \url{https://github.com/THUDM/GLM-130B/}.
FLM-101B: An Open LLM and How to Train It with $100K Budget
Li, Xiang, Yao, Yiqun, Jiang, Xin, Fang, Xuezhi, Meng, Xuying, Fan, Siqi, Han, Peng, Li, Jing, Du, Li, Qin, Bowen, Zhang, Zheng, Sun, Aixin, Wang, Yequan
Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks, among others. Despite these successes, two main challenges remain in developing LLMs: (i) high computational cost, and (ii) fair and objective evaluations. In this paper, we report a solution to significantly reduce LLM training cost through a growth strategy. We demonstrate that a 101B-parameter LLM with 0.31T tokens can be trained with a budget of 100K US dollars. Inspired by IQ tests, we also consolidate an additional range of evaluations on top of existing evaluations that focus on knowledge-oriented abilities. These IQ evaluations include symbolic mapping, rule understanding, pattern mining, and anti-interference. Such evaluations minimize the potential impact of memorization. Experimental results show that our model, named FLM-101B, trained with a budget of 100K US dollars, achieves performance comparable to powerful and well-known models, e.g., GPT-3 and GLM-130B, especially on the additional range of IQ evaluations. The checkpoint of FLM-101B is released at https://huggingface.co/CofeAI/FLM-101B.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Thailand (0.04)
- (22 more...)
GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation
Guo, Biyang, Gong, Yeyun, Shen, Yelong, Han, Songqiao, Huang, Hailiang, Duan, Nan, Chen, Weizhu
We introduce GENIUS: a conditional text generation model using sketches as input, which can fill in the missing contexts for a given sketch (key information consisting of textual spans, phrases, or words, concatenated by mask tokens). GENIUS is pre-trained on a large-scale textual corpus with a novel reconstruction from sketch objective using an extreme and selective masking strategy, enabling it to generate diverse and high-quality texts given sketches. Comparison with other competitive conditional language models (CLMs) reveals the superiority of GENIUS's text generation quality. We further show that GENIUS can be used as a strong and ready-to-use data augmentation tool for various natural language processing (NLP) tasks. Most existing textual data augmentation methods are either too conservative, by making small changes to the original text, or too aggressive, by creating entirely new samples. With GENIUS, we propose GeniusAug, which first extracts the target-aware sketches from the original training set and then generates new samples based on the sketches. Empirical experiments on 6 text classification datasets show that GeniusAug significantly improves the models' performance in both in-distribution (ID) and out-of-distribution (OOD) settings. We also demonstrate the effectiveness of GeniusAug on named entity recognition (NER) and machine reading comprehension (MRC) tasks. (Code and models are publicly available at https://github.com/microsoft/SCGLab and https://github.com/beyondguo/genius)
- Europe > Germany (0.14)
- Europe > France (0.04)
- South America > Brazil (0.04)
- (8 more...)
- Leisure & Entertainment > Sports (0.93)
- Government > Regional Government > Europe Government (0.46)
- Education > Assessment & Standards > Student Performance (0.34)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)
GLM-130B: The most capable AI language model currently available comes from China
A Chinese language model performs better than OpenAI's GPT-3 and Google's PaLM. Huawei shows a Codex alternative. Large AI models for language, code, and images play a central role in the current proliferation of artificial intelligence. Researchers at Stanford University therefore even want to call such models "foundation models." The pioneer in the development of very large AI models is the U.S. AI company OpenAI, whose GPT-3 language model first demonstrated the usefulness of such AI systems.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.52)