AITopics | lsh attention

Collaborating Authors

lsh attention

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

5edc4f7dce28c711afc6265b4f99bf57-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 22:36:56 GMT

accuracy, computer vision, lsh attention, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.36)

Technology: Information Technology > Artificial Intelligence > Vision (0.76)

Add feedback

5edc4f7dce28c711afc6265b4f99bf57-Supplemental.pdf

Neural Information Processing SystemsAug-14-2025, 18:37:15 GMT

accuracy, computer vision, lsh attention, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.76)

Add feedback

Reformer: The Efficient Transformer

Kitaev, Nikita, Kaiser, Łukasz, Levskaya, Anselm

arXiv.org Machine LearningJan-13-2020

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences. The Transformer architecture (V aswani et al., 2017) is widely used in natural language processing and yields state-of-the-art results on a number of tasks. To obtain these results, researchers have resorted to training ever larger Transformer models. The number of parameters exceeds 0.5B per layer in the largest configuration reported in (Shazeer et al., 2018) while the number of layers goes up to 64 in (Al-Rfou et al., 2018). Transformer models are also used on increasingly long sequences.

lsh attention, sequence, transformer, (17 more...)

arXiv.org Machine Learning

2001.04451

Genre: Research Report (0.84)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback