AITopics | compressive transformer

Collaborating Authors

compressive transformer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NLP Tutorials -- Part 20: Compressive Transformer

#artificialintelligenceJun-3-2022, 07:00:32 GMT

Welcome back to yet another interesting improvement of the Transformer (Attention is All You Need) architecture -- Compressive Transformers. This particular architecture has a lower memory requirement than Vanilla Transformer and is similar to the Transformer-XL that models longer sequences efficiently. The below image depicts how the memory is compressed. We can also say that this is drawing some parallels to the human brain -- We have a brilliant memory because of the power of compressing and storing information very intelligently. This sure seems interesting, doesn't it?

compressive transformer, transformer, transformer-xl, (12 more...)

#artificialintelligence

Genre: Instructional Material (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

Infinite Memory Transformer: Attending to Arbitrarily Long Contexts Without Increasing Computation Burden

#artificialintelligenceDec-11-2021, 16:35:54 GMT

When reading a novel, humans naturally remember relevant plot information even if it was presented many chapters earlier. Although today's transformer-based language models have made impressive progress in natural language processing, they struggle in this regard, as the compute required for modelling long-term memories grows quadratically with the length of the text and will eventually exceed the model's finite memory capacity. To overcome this limitation, a research team from Instituto de Telecomunicações, DeepMind, Institute of Systems and Robotics, Instituto Superior Técnico and Unbabel has proposed " -former" (infinite former) -- a transformer model equipped with unbounded long-term memory (LTM) that enables it to attend to arbitrarily long contexts. The team extends the vanilla transformer with a continuous LTM to enable their proposed -former to access long-range context. The novel approach employs a continuous space attention framework to attend over the LTM signal, in which key matrix size depends on the number of basis functions instead of the length of the context being attended to.

computation burden, infinite memory transformer, long context, (5 more...)

#artificialintelligence

AI-Alerts: 2021 > 2021-12 > AAAI AI-Alert for Dec 14, 2021 (1.00)

Genre: Research Report (0.38)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Facebook Open-Sources Expire-Span Method for Scaling Transformer AI

#artificialintelligenceJun-19-2021, 13:49:38 GMT

Facebook AI Research (FAIR) open-sourced Expire-Span, a deep-learning technique that learns which items in an input sequence should be remembered, reducing the memory and computation requirements for AI. FAIR showed that Transformer models that incorporate Expire-Span can scale to sequences of tens of thousands of items with improved performance compared to previous models. The research team described the technique and several experiments in a paper to be presented at the upcoming International Conference on Machine Learning (ICML). Expire-Span allows sequential AI models to "forget" events that are no longer relevant. When incorporated into self-attention models, such as the Transformer, Expire-Span reduces the amount of memory needed, allowing the model to handle longer sequences, which is key to improved performance on many tasks, such as natural language processing (NLP).

expire-span, sequence, transformer, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Not All Memories are Created Equal: Learning to Forget by Expiring

Sukhbaatar, Sainbayar, Ju, Da, Poff, Spencer, Roller, Stephen, Szlam, Arthur, Weston, Jason, Fan, Angela

arXiv.org Artificial IntelligenceJun-13-2021

Attention mechanisms have shown promising results in sequence modeling tasks that require long-term memory. Recent work investigated mechanisms to reduce the computational cost of preserving and storing memories. However, not all content in the past is equally important to remember. We propose Expire-Span, a method that learns to retain the most important information and expire the irrelevant information. This forgetting of memories enables Transformers to scale to attend over tens of thousands of previous timesteps efficiently, as not all states from previous timesteps are preserved. We demonstrate that Expire-Span can help models identify and retain critical information and show it can achieve strong performance on reinforcement learning tasks specifically designed to challenge this functionality. Next, we show that Expire-Span can scale to memories that are tens of thousands in size, setting a new state of the art on incredibly long context tasks such as character-level language modeling and a frame-by-frame moving objects task. Finally, we analyze the efficiency of Expire-Span compared to existing approaches and demonstrate that it trains faster and uses less memory.

information, learning, span, (14 more...)

arXiv.org Artificial Intelligence

2105.06548

Country: Africa > Middle East > Egypt (0.05)

Genre: Research Report (0.50)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Can the study of sleep help create better AI models?

#artificialintelligenceFeb-18-2020, 04:11:44 GMT

This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence. One obvious reason is to restore the strength of our bodies and limbs. But another very important role of sleep is to consolidate memories and organize all the information that your brain has ingested while being awake. People who lack proper sleep see their cognitive abilities degrade and their memories fail. The wonders and mysteries of sleep remain an active area of research.

compressive transformer, information, neural network, (13 more...)

#artificialintelligence

Country: North America > United States > New York (0.05)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Compressive Transformers for Long-Range Sequence Modelling

Rae, Jack W., Potapenko, Anna, Jayakumar, Siddhant M., Lillicrap, Timothy P.

arXiv.org Machine LearningNov-13-2019

We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Com-pressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17. 1 ppl and 0. 97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19. Humans have a remarkable ability to remember information over long time horizons. When reading a book, we build up a compressed representation of the past narrative, such as the characters and events that have built up the story so far. We can do this even if they are separated by thousands of words from the current text, or long stretches of time between readings. During daily life, we make use of memories at varying timescales: from locating the car keys, placed in the morning, to recalling the name of an old friend from decades ago. These feats of memorisation are not achieved by storing every sensory glimpse throughout one's lifetime, but via lossy compression. We aggressively select, filter, or integrate input stimuli based on factors of surprise, perceived danger, or repetition -- amongst other signals (Richards and Frankland, 2017). Memory systems in artificial neural networks began with very compact representations of the past. Recurrent neural networks (RNNs, Rumelhart et al. (1986)) learn to represent the history of observations in a compressed state vector. The state is compressed because it uses far less space than the history of observations -- the model only preserving information that is pertinent to the optimization of the loss.

arxiv preprint arxiv, compressive transformer, transformer, (15 more...)

arXiv.org Machine Learning

1911.05507

Country:

Africa > Middle East > Egypt (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Energy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback