AITopics | Zhou, Pingyi

Collaborating Authors

Zhou, Pingyi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

XL3M: A Training-free Framework for LLM Length Extension Based on Segment-wise Inference

Wang, Shengnan, Bai, Youhui, Zhang, Lin, Zhou, Pingyi, Zhao, Shixiong, Zhang, Gong, Wang, Sen, Chen, Renhai, Xu, Hua, Sun, Hongwei

arXiv.org Artificial IntelligenceMay-27-2024

Length generalization failure problem, namely the large language model (LLM) fails to generalize to texts longer than its maximum training length, greatly restricts the application of LLM in the scenarios with streaming long inputs. To address this problem, the existing methods either require substantial costs or introduce precision loss. In this paper, we empirically find that the accuracy of the LLM's prediction is highly correlated to its certainty. Based on this, we propose an efficient training free framework, named XL3M (it means extra-long large language model), which enables the LLMs trained on short sequences to reason extremely long sequence without any further training or fine-tuning. Under the XL3M framework, the input context will be firstly decomposed into multiple short sub-contexts, where each sub-context contains an independent segment and a common ``question'' which is a few tokens from the end of the original context. Then XL3M gives a method to measure the relevance between each segment and the ``question'', and constructs a concise key context by splicing all the relevant segments in chronological order. The key context is further used instead of the original context to complete the inference task. Evaluations on comprehensive benchmarks show the superiority of XL3M. Using our framework, a Llama2-7B model is able to reason 20M long sequences on an 8-card Huawei Ascend 910B NPU machine with 64GB memory per card.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2405.17755

Genre: Research Report (0.50)

Industry: Telecommunications (0.38)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Extending Context Window of Large Language Models via Semantic Compression

Fei, Weizhi, Niu, Xueyan, Zhou, Pingyi, Hou, Lu, Bai, Bo, Deng, Lei, Han, Wei

arXiv.org Artificial IntelligenceDec-15-2023

Transformer-based Large Language Models (LLMs) often impose limitations on the length of the text input to ensure the generation of fluent and relevant responses. This constraint restricts their applicability in scenarios involving long texts. We propose a novel semantic compression method that enables generalization to texts that are 6-8 times longer, without incurring significant computational costs or requiring fine-tuning. Our proposed framework draws inspiration from source coding in information theory and employs a pre-trained model to reduce the semantic redundancy of long inputs before passing them to the LLMs for downstream tasks. Experimental results demonstrate that our method effectively extends the context window of LLMs across a range of tasks including question answering, summarization, few-shot learning, and information retrieval. Furthermore, the proposed semantic compression method exhibits consistent fluency in text generation while reducing the associated computational overhead.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2312.09571

Country:

Europe (0.46)
North America > United States (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

PanGu-{\Sigma}: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

Ren, Xiaozhe, Zhou, Pingyi, Meng, Xinfan, Huang, Xinjing, Wang, Yadao, Wang, Weichao, Li, Pengfei, Zhang, Xiaoda, Podolskiy, Alexander, Arshinov, Grigory, Bout, Andrey, Piontkovskaya, Irina, Wei, Jiansheng, Jiang, Xin, Su, Teng, Liu, Qun, Yao, Jun

arXiv.org Artificial IntelligenceMar-19-2023

The scaling of large language models has greatly improved natural language understanding, generation, and reasoning. In this work, we develop a system that trained a trillion-parameter language model on a cluster of Ascend 910 AI processors and MindSpore framework, and present the language model with 1.085T parameters named PanGu-{\Sigma}. With parameter inherent from PanGu-{\alpha}, we extend the dense Transformer model to sparse one with Random Routed Experts (RRE), and efficiently train the model over 329B tokens by using Expert Computation and Storage Separation(ECSS). This resulted in a 6.3x increase in training throughput through heterogeneous computing. Our experimental findings show that PanGu-{\Sigma} provides state-of-the-art performance in zero-shot learning of various Chinese NLP downstream tasks. Moreover, it demonstrates strong abilities when fine-tuned in application data of open-domain dialogue, question answering, machine translation and code generation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2303.10845

Country:

Europe (1.00)
Africa (0.92)
North America (0.67)
Asia > China > Guangdong Province (0.28)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Basketball (0.92)
Health & Medicine > Therapeutic Area (0.68)
Media (0.67)
Leisure & Entertainment > Sports > Football (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion

Gong, Zi, Guo, Yinpeng, Zhou, Pingyi, Gao, Cuiyun, Wang, Yasheng, Xu, Zenglin

arXiv.org Artificial IntelligenceDec-19-2022

Code completion is a valuable topic in both academia and industry. Recently, large-scale mono-programming-lingual (MonoPL) pre-training models have been proposed to boost the performance of code completion. However, the code completion on low-resource programming languages (PL) is difficult for the data-driven paradigm, while there are plenty of developers using low-resource PLs. On the other hand, there are few studies exploring the effects of multi-programming-lingual (MultiPL) pre-training for the code completion, especially the impact on low-resource programming languages. To this end, we propose the MultiCoder to enhance the low-resource code completion via MultiPL pre-training and MultiPL Mixture-of-Experts (MoE) layers. We further propose a novel PL-level MoE routing strategy (PL-MoE) for improving the code completion on all PLs. Experimental results on CodeXGLUE and MultiCC demonstrate that 1) the proposed MultiCoder significantly outperforms the MonoPL baselines on low-resource programming languages, and 2) the PL-MoE module further boosts the performance on six programming languages. In addition, we analyze the effects of the proposed method in details and explore the effectiveness of our method in a variety of scenarios.

machine learning, multicoder, programming language, (18 more...)

arXiv.org Artificial Intelligence

2212.09666

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

CLSEBERT: Contrastive Learning for Syntax Enhanced Code Pre-Trained Model

Wang, Xin, Wang, Yasheng, Zhou, Pingyi, Mi, Fei, Xiao, Meng, Wang, Yadao, Li, Li, Liu, Xiao, Wu, Hao, Liu, Jin, Jiang, Xin

arXiv.org Artificial IntelligenceAug-23-2021

Code pre-trained models have shown great success in various code-related tasks, such as code search, code clone detection, and code translation. Most existing code pre-trained models often treat a code snippet as a plain sequence of tokens. However, the inherent syntax and hierarchy that provide important structure and semantic information are ignored. The native derived sequence representations of them are insufficient. To this end, we propose CLSEBERT, a Contrastive Learning Framework for Syntax Enhanced Code Pre-Trained Model, to deal with various code intelligence tasks. In the pre-training stage, we consider the code syntax and hierarchy contained in the Abstract Syntax Tree (AST) and leverage the Contrastive Learning (CL) to learn noise-invariant code representations. Besides the original masked language model (MLM) objective, we also introduce two novel pre-training objectives: (1) ``AST Node Edge Prediction (NEP)'' to predict edges between nodes in the abstract syntax tree; (2) ``Code Token Type Prediction (TTP)'' to predict the types of code tokens. Extensive experiments on four code intelligence tasks demonstrate the superior performance of CLSEBERT compared to state-of-the-art at the same pre-training corpus and parameter scale.

deep learning, neural network, representation, (19 more...)

arXiv.org Artificial Intelligence

2108.04556

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)

Add feedback