Goto

Collaborating Authors

 compressive transformer


NLP Tutorials -- Part 20: Compressive Transformer

#artificialintelligence

Welcome back to yet another interesting improvement of the Transformer (Attention is All You Need) architecture -- Compressive Transformers. This particular architecture has a lower memory requirement than Vanilla Transformer and is similar to the Transformer-XL that models longer sequences efficiently. The below image depicts how the memory is compressed. We can also say that this is drawing some parallels to the human brain -- We have a brilliant memory because of the power of compressing and storing information very intelligently. This sure seems interesting, doesn't it?


Infinite Memory Transformer: Attending to Arbitrarily Long Contexts Without Increasing Computation Burden

#artificialintelligence

When reading a novel, humans naturally remember relevant plot information even if it was presented many chapters earlier. Although today's transformer-based language models have made impressive progress in natural language processing, they struggle in this regard, as the compute required for modelling long-term memories grows quadratically with the length of the text and will eventually exceed the model's finite memory capacity. To overcome this limitation, a research team from Instituto de Telecomunicações, DeepMind, Institute of Systems and Robotics, Instituto Superior Técnico and Unbabel has proposed " -former" (infinite former) -- a transformer model equipped with unbounded long-term memory (LTM) that enables it to attend to arbitrarily long contexts. The team extends the vanilla transformer with a continuous LTM to enable their proposed -former to access long-range context. The novel approach employs a continuous space attention framework to attend over the LTM signal, in which key matrix size depends on the number of basis functions instead of the length of the context being attended to.


Facebook Open-Sources Expire-Span Method for Scaling Transformer AI

#artificialintelligence

Facebook AI Research (FAIR) open-sourced Expire-Span, a deep-learning technique that learns which items in an input sequence should be remembered, reducing the memory and computation requirements for AI. FAIR showed that Transformer models that incorporate Expire-Span can scale to sequences of tens of thousands of items with improved performance compared to previous models. The research team described the technique and several experiments in a paper to be presented at the upcoming International Conference on Machine Learning (ICML). Expire-Span allows sequential AI models to "forget" events that are no longer relevant. When incorporated into self-attention models, such as the Transformer, Expire-Span reduces the amount of memory needed, allowing the model to handle longer sequences, which is key to improved performance on many tasks, such as natural language processing (NLP).


Not All Memories are Created Equal: Learning to Forget by Expiring

arXiv.org Artificial Intelligence

Attention mechanisms have shown promising results in sequence modeling tasks that require long-term memory. Recent work investigated mechanisms to reduce the computational cost of preserving and storing memories. However, not all content in the past is equally important to remember. We propose Expire-Span, a method that learns to retain the most important information and expire the irrelevant information. This forgetting of memories enables Transformers to scale to attend over tens of thousands of previous timesteps efficiently, as not all states from previous timesteps are preserved. We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality. Next, we show that Expire-Span can scale to memories that are tens of thousands in size, setting a new state of the art on incredibly long context tasks such as character-level language modeling and a frame-by-frame moving objects task. Finally, we analyze the efficiency of Expire-Span compared to existing approaches and demonstrate that it trains faster and uses less memory.


Can the study of sleep help create better AI models?

#artificialintelligence

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence. One obvious reason is to restore the strength of our bodies and limbs. But another very important role of sleep is to consolidate memories and organize all the information that your brain has ingested while being awake. People who lack proper sleep see their cognitive abilities degrade and their memories fail. The wonders and mysteries of sleep remain an active area of research.


Compressive Transformers for Long-Range Sequence Modelling

arXiv.org Machine Learning

We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Com-pressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17. 1 ppl and 0. 97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19. Humans have a remarkable ability to remember information over long time horizons. When reading a book, we build up a compressed representation of the past narrative, such as the characters and events that have built up the story so far. We can do this even if they are separated by thousands of words from the current text, or long stretches of time between readings. During daily life, we make use of memories at varying timescales: from locating the car keys, placed in the morning, to recalling the name of an old friend from decades ago. These feats of memorisation are not achieved by storing every sensory glimpse throughout one's lifetime, but via lossy compression. We aggressively select, filter, or integrate input stimuli based on factors of surprise, perceived danger, or repetition -- amongst other signals (Richards and Frankland, 2017). Memory systems in artificial neural networks began with very compact representations of the past. Recurrent neural networks (RNNs, Rumelhart et al. (1986)) learn to represent the history of observations in a compressed state vector. The state is compressed because it uses far less space than the history of observations -- the model only preserving information that is pertinent to the optimization of the loss.