AITopics | glm-130b

Collaborating Authors

glm-130b

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GLM-130B: An Open Bilingual Pre-trained Model

Zeng, Aohan, Liu, Xiao, Du, Zhengxiao, Wang, Zihan, Lai, Hanyu, Ding, Ming, Yang, Zhuoyi, Xu, Yifan, Zheng, Wendi, Xia, Xiao, Tam, Weng Lam, Ma, Zixuan, Xue, Yufei, Zhai, Jidong, Chen, Wenguang, Zhang, Peng, Dong, Yuxiao, Tang, Jie

arXiv.org Artificial IntelligenceOct-25-2023

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 (davinci) and unveil how models of such a scale can be successfully pre-trained. Over the course of this effort, we face numerous unexpected technical and engineering challenges, particularly on loss spikes and divergence. In this paper, we introduce the training process of GLM-130B including its design choices, training strategies for both efficiency and stability, and engineering efforts. The resultant GLM-130B model offers significant outperformance over GPT-3 175B (davinci) on a wide range of popular English benchmarks while the performance advantage is not observed in OPT-175B and BLOOM-176B. It also consistently and significantly outperforms ERNIE TITAN 3.0 260B -- the largest Chinese language model -- across related benchmarks. Finally, we leverage a unique scaling property of GLM-130B to reach INT4 quantization without post training, with almost no performance loss, making it the first among 100B-scale models and more importantly, allowing its effective inference on 4$\times$RTX 3090 (24G) or 8$\times$RTX 2080 Ti (11G) GPUs, the most affordable GPUs required for using 100B-scale models. The GLM-130B model weights are publicly accessible and its code, training logs, related toolkit, and lessons learned are open-sourced at \url{https://github.com/THUDM/GLM-130B/}.

machine learning, natural language, open bilingual pre-trained model, (2 more...)

arXiv.org Artificial Intelligence

2210.02414

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

FLM-101B: An Open LLM and How to Train It with $100K Budget

Li, Xiang, Yao, Yiqun, Jiang, Xin, Fang, Xuezhi, Meng, Xuying, Fan, Siqi, Han, Peng, Li, Jing, Du, Li, Qin, Bowen, Zhang, Zheng, Sun, Aixin, Wang, Yequan

arXiv.org Artificial IntelligenceSep-17-2023

Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks, among others. Despite these successes, two main challenges remain in developing LLMs: (i) high computational cost, and (ii) fair and objective evaluations. In this paper, we report a solution to significantly reduce LLM training cost through a growth strategy. We demonstrate that a 101B-parameter LLM with 0.31T tokens can be trained with a budget of 100K US dollars. Inspired by IQ tests, we also consolidate an additional range of evaluations on top of existing evaluations that focus on knowledge-oriented abilities. These IQ evaluations include symbolic mapping, rule understanding, pattern mining, and anti-interference. Such evaluations minimize the potential impact of memorization. Experimental results show that our model, named FLM-101B, trained with a budget of 100K US dollars, achieves performance comparable to powerful and well-known models, e.g., GPT-3 and GLM-130B, especially on the additional range of IQ evaluations. The checkpoint of FLM-101B is released at https://huggingface.co/CofeAI/FLM-101B.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2309.03852

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
Asia > Thailand (0.04)
(22 more...)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation

Guo, Biyang, Gong, Yeyun, Shen, Yelong, Han, Songqiao, Huang, Hailiang, Duan, Nan, Chen, Weizhu

arXiv.org Artificial IntelligenceNov-18-2022

We introduce GENIUS: a conditional text generation model using sketches as input, which can fill in the missing contexts for a given sketch (key information consisting of textual spans, phrases, or words, concatenated by mask tokens). GENIUS is pre-trained on a large-scale textual corpus with a novel reconstruction from sketch objective using an extreme and selective masking strategy, enabling it to generate diverse and high-quality texts given sketches. Comparison with other competitive conditional language models (CLMs) reveals the superiority of GENIUS's text generation quality. We further show that GENIUS can be used as a strong and ready-to-use data augmentation tool for various natural language processing (NLP) tasks. Most existing textual data augmentation methods are either too conservative, by making small changes to the original text, or too aggressive, by creating entirely new samples. With GENIUS, we propose GeniusAug, which first extracts the target-aware sketches from the original training set and then generates new samples based on the sketches. Empirical experiments on 6 text classification datasets show that GeniusAug significantly improves the models' performance in both in-distribution (ID) and out-of-distribution (OOD) settings. We also demonstrate the effectiveness of GeniusAug on named entity recognition (NER) and machine reading comprehension (MRC) tasks. (Code and models are publicly available at https://github.com/microsoft/SCGLab and https://github.com/beyondguo/genius)

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2211.1033

Country:

Europe > Germany (0.14)
Europe > France (0.04)
South America > Brazil (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports (0.93)
Government > Regional Government > Europe Government (0.46)
Education > Assessment & Standards > Student Performance (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)

Add feedback

GLM-130B: The most capable AI language model currently available comes from China

#artificialintelligenceAug-28-2022, 10:00:33 GMT

A Chinese language model performs better than OpenAI's GPT-3 and Google's PaLM. Huawei shows a Codex alternative. Large AI models for language, code, and images play a central role in the current proliferation of artificial intelligence. Researchers at Stanford University therefore even want to call such models "foundation models." The pioneer in the development of very large AI models is the U.S. AI company OpenAI, whose GPT-3 language model first demonstrated the usefulness of such AI systems.

benchmark, glm-130b, language model, (12 more...)

#artificialintelligence

AI-Alerts: 2022 > 2022-08 > AAAI AI-Alert for Aug 30, 2022 (1.00)

Country: Asia > China > Beijing > Beijing (0.06)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.52)

Add feedback