AITopics | Choromanski, Krzysztof Marcin

Collaborating Authors

Choromanski, Krzysztof Marcin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mnemosyne: Learning to Train Transformers with Transformers

Jain, Deepali, Choromanski, Krzysztof Marcin, Dubey, Avinava, Singh, Sumeet, Sindhwani, Vikas, Zhang, Tingnan, Tan, Jie

arXiv.org Artificial IntelligenceJun-16-2023

In this work, we propose a new class of learnable optimizers, called \textit{Mnemosyne}. It is based on the novel spatio-temporal low-rank implicit attention Transformers that can learn to train entire neural network architectures, including other Transformers, without any task-specific optimizer tuning. We show that Mnemosyne: (a) outperforms popular LSTM optimizers (also with new feature engineering to mitigate catastrophic forgetting of LSTMs), (b) can successfully train Transformers while using simple meta-training strategies that require minimal computational resources, (c) matches accuracy-wise SOTA hand-designed optimizers with carefully tuned hyper-parameters (often producing top performing models). Furthermore, Mnemosyne provides space complexity comparable to that of its hand-designed first-order counterparts, which allows it to scale to training larger sets of parameters. We conduct an extensive empirical evaluation of Mnemosyne on: (a) fine-tuning a wide range of Vision Transformers (ViTs) from medium-size architectures to massive ViT-Hs (36 layers, 16 heads), (b) pre-training BERT models and (c) soft prompt-tuning large 11B+ T5XXL models. We complement our results with a comprehensive theoretical analysis of the compact associative memory used by Mnemosyne which we believe was never done before.

artificial intelligence, machine learning, mnemosyne, (18 more...)

arXiv.org Artificial Intelligence

2302.01128

Country:

Oceania (0.67)
North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > New York > New York County > New York City (0.14)
(2 more...)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers

Choromanski, Krzysztof Marcin, Li, Shanda, Likhosherstov, Valerii, Dubey, Kumar Avinava, Luo, Shengjie, He, Di, Yang, Yiming, Sarlos, Tamas, Weingarten, Thomas, Weller, Adrian

arXiv.org Artificial IntelligenceFeb-3-2023

We propose a new class of linear Transformers called FourierLearner-Transformers (FLTs), which incorporate a wide range of relative positional encoding mechanisms (RPEs). These include regular RPE techniques applied for nongeometric data, as well as novel RPEs operating on the sequences of tokens embedded in higher-dimensional Euclidean spaces (e.g. point clouds). FLTs construct the optimal RPE mechanism implicitly by learning its spectral representation. As opposed to other architectures combining efficient low-rank linear attention with RPEs, FLTs remain practical in terms of their memory usage and do not require additional assumptions about the structure of the RPE-mask. FLTs allow also for applying certain structural inductive bias techniques to specify masking strategies, e.g. they provide a way to learn the so-called local RPEs introduced in this paper and providing accuracy gains as compared with several other linear Transformers for language modeling. We also thoroughly tested FLTs on other data modalities and tasks, such as: image classification and 3D molecular modeling. For 3D-data FLTs are, to the best of our knowledge, the first Transformers architectures providing RPE-enhanced linear attention.

machine learning, mechanism, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.01925

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
(2 more...)

Add feedback