auto-regressive language modeling
Decoupled Context Processing for Context Augmented Language Modeling Zonglin Li
Language models can be augmented with a context retriever to incorporate knowledge from large external databases. By leveraging retrieved context, the neural network does not have to memorize the massive amount of world knowledge within its internal parameters, leading to better parameter efficiency, interpretability and mod-ularity.
Searching for Efficient Transformers for Language Modeling
Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs of Transformers by searching for a more efficient variant. Compared to previous approaches, our search is performed at a lower level, over the primitives that define a Transformer TensorFlow program. We identify an architecture, named Primer, that has a smaller training cost than the original Transformer and other variants for auto-regressive language modeling.
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.85)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Cognitive Science (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
On the Power of Decision Trees in Auto-Regressive Language Modeling
Originally proposed for handling time series data, Auto-regressive Decision Trees (ARDTs) have not yet been explored for language modeling. This paper delves into both the theoretical and practical applications of ARDTs in this new context. We theoretically demonstrate that ARDTs can compute complex functions, such as simulating automata, Turing machines, and sparse circuits, by leveraging "chain-of-thought" computations. Our analysis provides bounds on the size, depth, and computational efficiency of ARDTs, highlighting their surprising computational power. Empirically, we train ARDTs on simple language generation tasks, showing that they can learn to generate coherent and grammatically correct text on par with a smaller Transformer model.
Searching for Efficient Transformers for Language Modeling
Large Transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here we aim to reduce the costs of Transformers by searching for a more efficient variant. Compared to previous approaches, our search is performed at a lower level, over the primitives that define a Transformer TensorFlow program. We identify an architecture, named Primer, that has a smaller training cost than the original Transformer and other variants for auto-regressive language modeling.
PIXAR: Auto-Regressive Language Modeling in Pixel Space
Tai, Yintao, Liao, Xiyang, Suglia, Alessandro, Vergari, Antonio
Recent works showed the possibility of building open-vocabulary large language models (LLMs) that directly operate on pixel representations and are implemented as encoder-decoder models that reconstruct masked image patches of rendered text. However, these pixel-based LLMs are limited to autoencoding tasks and cannot generate new text as images. As such, they cannot be used for open-answer or generative language tasks. In this work, we overcome this limitation and introduce PIXAR, the first pixel-based autoregressive LLM that does not rely on a pre-defined vocabulary for both input and output text. Consisting of only a decoder, PIXAR can answer free-form generative tasks while keeping the text representation learning performance on par with previous encoder-decoder models. Furthermore, we highlight the challenges to autoregressively generate non-blurred text as images and link this to the usual maximum likelihood objective. We propose a simple adversarial pretraining that significantly improves the readability and performance of PIXAR making it comparable to GPT2 on short text generation tasks. This paves the way to building open-vocabulary LLMs that are usable for free-form generative tasks and questions the necessity of the usual symbolic input representation -- text as tokens -- for these challenging tasks.
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- (2 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
Hot papers on arXiv from the past month: September 2021
Comparing the visual quality of generated frames. From Diverse Generation from a Single Video Made Possible. Reproduced under a CC BY 4.0 license. Here are the most tweeted papers that were uploaded onto arXiv during September 2021. Results are powered by Arxiv Sanity Preserver. Abstract: Generative adversary network (GAN) generated high-realistic human faces have been used as profile images for fake social media accounts and are visually challenging to discern from real ones.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Vision (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)