Google's CoLT5 Processes Extremely Long Inputs via Conditional Computation

#artificialintelligence 

One of the highlights of OpenAI's GPT-4 large language model (LLM) is its expanded context window size of 32,000 tokens (about 25,000 words), which enables longer input sequences and conversations than ChatGPT's 4,000 token limit. While expanding the processing capacities of transformer-based LLMs in this way is beneficial, it is also computationally costly due to the quadratic complexity of the models' attention mechanisms and the application of feedforward and projection layers to every token. A Google Research team addresses this issue in the new paper CoLT5: Faster Long-Range Transformers with Conditional Computation, proposing CoLT5 (Conditional LongT5), a family of transformer models that apply a novel conditional computation approach for higher quality and faster long-input processing of up to 64,000 tokens. CoLT5 is built on Google's LongT5 (Gua et al., 2022), which simultaneously scales input length and model size to improve long-input processing in transformers; and is inspired by the idea that better performance and reduced computation cost can be achieved via a novel "conditional computation" approach that allocates more computation to important tokens. The conditional computation mechanism comprises three main components: 1) Routing modules, which select important tokens at each attention or feedforward layer; 2) A conditional feedforward layer that applies an additional high-capacity feedforward layer to select important routed tokens; and 3) A conditional attention layer that enables CoLT5 to differentiate between tokens that require additional information and those that already possess such information.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found