Infinite Memory Transformer: Attending to Arbitrarily Long Contexts Without Increasing Computation Burden
When reading a novel, humans naturally remember relevant plot information even if it was presented many chapters earlier. Although today's transformer-based language models have made impressive progress in natural language processing, they struggle in this regard, as the compute required for modelling long-term memories grows quadratically with the length of the text and will eventually exceed the model's finite memory capacity. To overcome this limitation, a research team from Instituto de Telecomunicações, DeepMind, Institute of Systems and Robotics, Instituto Superior Técnico and Unbabel has proposed " -former" (infinite former) -- a transformer model equipped with unbounded long-term memory (LTM) that enables it to attend to arbitrarily long contexts. The team extends the vanilla transformer with a continuous LTM to enable their proposed -former to access long-range context. The novel approach employs a continuous space attention framework to attend over the LTM signal, in which key matrix size depends on the number of basis functions instead of the length of the context being attended to.
Dec-11-2021, 16:35:54 GMT
- AI-Alerts:
- 2021 > 2021-12 > AAAI AI-Alert for Dec 14, 2021 (1.00)
- Genre:
- Research Report (0.38)
- Technology: