AITopics | Sarlos, Tamas

Collaborating Authors

Sarlos, Tamas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fast Tree-Field Integrators: From Low Displacement Rank to Topological Transformers

Choromanski, Krzysztof, Sehanobish, Arijit, Chowdhury, Somnath Basu Roy, Lin, Han, Dubey, Avinava, Sarlos, Tamas, Chaturvedi, Snigdha

arXiv.org Artificial IntelligenceJun-22-2024

We present a new class of fast polylog-linear algorithms based on the theory of structured matrices (in particular low displacement rank) for integrating tensor fields defined on weighted trees. Several applications of the resulting fast tree-field integrators (FTFIs) are presented, including (a) approximation of graph metrics with tree metrics, (b) graph classification, (c) modeling on meshes, and finally (d) Topological Transformers (TTs) (Choromanski et al., 2022) for images. For Topological Transformers, we propose new relative position encoding (RPE) masking mechanisms with as few as three extra learnable parameters per Transformer layer, leading to 1.0-1.5%+ accuracy gains. Importantly, most of FTFIs are exact methods, thus numerically equivalent to their brute-force counterparts. When applied to graphs with thousands of nodes, those exact algorithms provide 5.7-13x speedups. We also provide an extensive theoretical analysis of our methods.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Artificial Intelligence

2406.15881

Country:

North America > Canada > Quebec (0.14)
North America > United States > Texas (0.14)
North America > United States > Maryland (0.14)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention

Leal, Isabel, Choromanski, Krzysztof, Jain, Deepali, Dubey, Avinava, Varley, Jake, Ryoo, Michael, Lu, Yao, Liu, Frederick, Sindhwani, Vikas, Vuong, Quan, Sarlos, Tamas, Oslund, Ken, Hausman, Karol, Rao, Kanishka

arXiv.org Artificial IntelligenceDec-4-2023

We present Self-Adaptive Robust Attention for Robotics Transformers (SARA-RT): a new paradigm for addressing the emerging challenge of scaling up Robotics Transformers (RT) for on-robot deployment. SARA-RT relies on the new method of fine-tuning proposed by us, called up-training. It converts pre-trained or already fine-tuned Transformer-based robotic policies of quadratic time complexity (including massive billion-parameter vision-language-action models or VLAs), into their efficient linear-attention counterparts maintaining high quality. We demonstrate the effectiveness of SARA-RT by speeding up: (a) the class of recently introduced RT-2 models, the first VLA robotic policies pre-trained on internet-scale data, as well as (b) Point Cloud Transformer (PCT) robotic policies operating on large point clouds. We complement our results with the rigorous mathematical analysis providing deeper insight into the phenomenon of SARA.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2312.0199

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Hot PATE: Private Aggregation of Distributions for Diverse Task

Cohen, Edith, Lyu, Xin, Nelson, Jelani, Sarlos, Tamas, Stemmer, Uri

arXiv.org Artificial IntelligenceDec-4-2023

The Private Aggregation of Teacher Ensembles (PATE) framework~\cite{PapernotAEGT:ICLR2017} is a versatile approach to privacy-preserving machine learning. In PATE, teacher models are trained on distinct portions of sensitive data, and their predictions are privately aggregated to label new training examples for a student model. Until now, PATE has primarily been explored with classification-like tasks, where each example possesses a ground-truth label, and knowledge is transferred to the student by labeling public examples. Generative AI models, however, excel in open ended \emph{diverse} tasks with multiple valid responses and scenarios that may not align with traditional labeled examples. Furthermore, the knowledge of models is often encapsulated in the response distribution itself and may be transferred from teachers to student in a more fluid way. We propose \emph{hot PATE}, tailored for the diverse setting. In hot PATE, each teacher model produces a response distribution and the aggregation method must preserve both privacy and diversity of responses. We demonstrate, analytically and empirically, that hot PATE achieves privacy-utility tradeoffs that are comparable to, and in diverse settings, significantly surpass, the baseline ``cold'' PATE.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2312.02132

Country:

North America > United States > California (0.14)
Asia > Middle East > Israel (0.14)
North America > United States > Colorado (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.40)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

Hardness of Low Rank Approximation of Entrywise Transformed Matrix Products

Sarlos, Tamas, Song, Xingyou, Woodruff, David, Qiuyi, null, Zhang, null

arXiv.org Artificial IntelligenceNov-3-2023

Inspired by fast algorithms in natural language processing, we study low rank approximation in the entrywise transformed setting where we want to find a good rank $k$ approximation to $f(U \cdot V)$, where $U, V^\top \in \mathbb{R}^{n \times r}$ are given, $r = O(\log(n))$, and $f(x)$ is a general scalar function. Previous work in sublinear low rank approximation has shown that if both (1) $U = V^\top$ and (2) $f(x)$ is a PSD kernel function, then there is an $O(nk^{\omega-1})$ time constant relative error approximation algorithm, where $\omega \approx 2.376$ is the exponent of matrix multiplication. We give the first conditional time hardness results for this problem, demonstrating that both conditions (1) and (2) are in fact necessary for getting better than $n^{2-o(1)}$ time for a relative error low rank approximation for a wide class of functions. We give novel reductions from the Strong Exponential Time Hypothesis (SETH) that rely on lower bounding the leverage scores of flat sparse vectors and hold even when the rank of the transformed matrix $f(UV)$ and the target rank are $n^{o(1)}$, and when $U = V^\top$. Furthermore, even when $f(x) = x^p$ is a simple polynomial, we give runtime lower bounds in the case when $U \neq V^\top$ of the form $\Omega(\min(n^{2-o(1)}, \Omega(2^p)))$. Lastly, we demonstrate that our lower bounds are tight by giving an $O(n \cdot \text{poly}(k, 2^p, 1/\epsilon))$ time relative error approximation algorithm and a fast $O(n \cdot \text{poly}(k, p, 1/\epsilon))$ additive error approximation using fast tensor-based sketching. Additionally, since our low rank algorithms rely on matrix-vector product subroutines, our lower bounds extend to show that computing $f(UV)W$, for even a small matrix $W$, requires $\Omega(n^{2-o(1)})$ time.

artificial intelligence, entrywise transformed matrix product, natural language, (2 more...)

arXiv.org Artificial Intelligence

2311.0196

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.87)

Add feedback

Efficient Graph Field Integrators Meet Point Clouds

Choromanski, Krzysztof, Sehanobish, Arijit, Lin, Han, Zhao, Yunfan, Berger, Eli, Parshakova, Tetiana, Pan, Alvin, Watkins, David, Zhang, Tianyi, Likhosherstov, Valerii, Chowdhury, Somnath Basu Roy, Dubey, Avinava, Jain, Deepali, Sarlos, Tamas, Chaturvedi, Snigdha, Weller, Adrian

arXiv.org Artificial IntelligenceOct-4-2023

We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds. Both can be viewed as providing the functionality of Fast Multipole Methods (FMMs), which have had a tremendous impact on efficient integration, but for non-Euclidean spaces. We focus on geometries induced by distributions of walk lengths between points (e.g., shortest-path distance). We provide an extensive theoretical analysis of our algorithms, obtaining new results in structural graph theory as a byproduct. We also perform exhaustive empirical evaluation, including on-surface interpolation for rigid and deformable objects (particularly for mesh-dynamics modeling), Wasserstein distance computations for point clouds, and the Gromov-Wasserstein variant.

artificial intelligence, field integrator meet point cloud

arXiv.org Artificial Intelligence

2302.00942

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.53)

Add feedback

ES-ENAS: Efficient Evolutionary Optimization for Large Hybrid Search Spaces

Song, Xingyou, Choromanski, Krzysztof, Parker-Holder, Jack, Tang, Yunhao, Zhang, Qiuyi, Peng, Daiyi, Jain, Deepali, Gao, Wenbo, Pacchiano, Aldo, Sarlos, Tamas, Yang, Yuxiang

arXiv.org Artificial IntelligenceMar-15-2023

In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters. We demonstrate that previous evolutionary algorithms which rely on mutation-based approaches, while flexible over combinatorial spaces, suffer from a curse of dimensionality in high dimensional continuous spaces both theoretically and empirically, which thus limits their scope over hybrid search spaces as well. In order to combat this curse, we propose ES-ENAS, a simple and modular joint optimization procedure combining the class of sample-efficient smoothed gradient techniques, commonly known as Evolutionary Strategies (ES), with combinatorial optimizers in a highly scalable and intuitive way, inspired by the one-shot or supernet paradigm introduced in Efficient Neural Architecture Search (ENAS). By doing so, we achieve significantly more sample efficiency, which we empirically demonstrate over synthetic benchmarks, and are further able to apply ES-ENAS for architecture search over popular RL benchmarks.

artificial intelligence, evolutionary algorithm, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2101.07415

Country:

Europe (1.00)
North America > United States > California (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(2 more...)

Add feedback

Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers

Choromanski, Krzysztof Marcin, Li, Shanda, Likhosherstov, Valerii, Dubey, Kumar Avinava, Luo, Shengjie, He, Di, Yang, Yiming, Sarlos, Tamas, Weingarten, Thomas, Weller, Adrian

arXiv.org Artificial IntelligenceFeb-3-2023

We propose a new class of linear Transformers called FourierLearner-Transformers (FLTs), which incorporate a wide range of relative positional encoding mechanisms (RPEs). These include regular RPE techniques applied for nongeometric data, as well as novel RPEs operating on the sequences of tokens embedded in higher-dimensional Euclidean spaces (e.g. point clouds). FLTs construct the optimal RPE mechanism implicitly by learning its spectral representation. As opposed to other architectures combining efficient low-rank linear attention with RPEs, FLTs remain practical in terms of their memory usage and do not require additional assumptions about the structure of the RPE-mask. FLTs allow also for applying certain structural inductive bias techniques to specify masking strategies, e.g. they provide a way to learn the so-called local RPEs introduced in this paper and providing accuracy gains as compared with several other linear Transformers for language modeling. We also thoroughly tested FLTs on other data modalities and tasks, such as: image classification and 3D molecular modeling. For 3D-data FLTs are, to the best of our knowledge, the first Transformers architectures providing RPE-enhanced linear attention.

machine learning, mechanism, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.01925

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
(2 more...)

Add feedback

FAVOR#: Sharp Attention Kernel Approximations via New Classes of Positive Random Features

Likhosherstov, Valerii, Choromanski, Krzysztof, Dubey, Avinava, Liu, Frederick, Sarlos, Tamas, Weller, Adrian

arXiv.org Artificial IntelligenceFeb-1-2023

The problem of efficient approximation of a linear operator induced by the Gaussian or softmax kernel is often addressed using random features (RFs) which yield an unbiased approximation of the operator's result. Such operators emerge in important applications ranging from kernel methods to efficient Transformers. We propose parameterized, positive, non-trigonometric RFs which approximate Gaussian and softmax-kernels. In contrast to traditional RF approximations, parameters of these new methods can be optimized to reduce the variance of the approximation, and the optimum can be expressed in closed form. We show that our methods lead to variance reduction in practice ($e^{10}$-times smaller variance and beyond) and outperform previous methods in a kernel regression task. Using our proposed mechanism, we also present FAVOR#, a method for self-attention approximation in Transformers. We show that FAVOR# outperforms other random feature methods in speech modelling and natural language processing.

machine learning, natural language, sharp attention kernel approximation, (13 more...)

arXiv.org Artificial Intelligence

2302.00787

Country:

North America > United States (1.00)
North America > Canada > British Columbia (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Differentially Private Weighted Sampling

Cohen, Edith, Geri, Ofir, Sarlos, Tamas, Stemmer, Uri

arXiv.org Machine LearningOct-25-2020

Common datasets have the form of {\em elements} with {\em keys} (e.g., transactions and products) and the goal is to perform analytics on the aggregated form of {\em key} and {\em frequency} pairs. A weighted sample of keys by (a function of) frequency is a highly versatile summary that provides a sparse set of representative keys and supports approximate evaluations of query statistics. We propose {\em private weighted sampling} (PWS): A method that ensures element-level differential privacy while retaining, to the extent possible, the utility of a respective non-private weighted sample. PWS maximizes the reporting probabilities of keys and improves over the state of the art also for the well-studied special case of {\em private histograms}, when no sampling is performed. We empirically demonstrate significant performance gains compared with prior baselines: 20\%-300\% increase in key reporting for common Zipfian frequency distributions and accuracy for $\times 2$-$ 8$ lower frequencies in estimation tasks. Moreover, PWS is applied as a simple post-processing of a non-private sample, without requiring the original data. This allows for seamless integration with existing implementations of non-private schemes and retaining the efficiency of schemes designed for resource-constrained settings such as massive distributed or streamed data. We believe that due to practicality and performance, PWS may become a method of choice in applications where privacy is desired.

artificial intelligence, data mining, probability, (19 more...)

arXiv.org Machine Learning

2010.13048

Country:

North America > United States > California (0.28)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rethinking Attention with Performers

Choromanski, Krzysztof, Likhosherstov, Valerii, Dohan, David, Song, Xingyou, Gane, Andreea, Sarlos, Tamas, Hawkins, Peter, Davis, Jared, Mohiuddin, Afroz, Kaiser, Lukasz, Belanger, David, Colwell, Lucy, Weller, Adrian

arXiv.org Machine LearningSep-30-2020

We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random features approach (FAVOR+), which may be of independent interest for scalable kernel methods. FAVOR+ can be also used to efficiently model kernelizable attention mechanisms beyond softmax. This representational power is crucial to accurately compare softmax with other kernels for the first time on large-scale tasks, beyond the reach of regular Transformers, and investigate optimal attention-kernels. Performers are linear architectures fully compatible with regular Transformers and with strong theoretical guarantees: unbiased or nearly-unbiased estimation of the attention matrix, uniform convergence and low estimation variance. We tested Performers on a rich set of tasks stretching from pixel-prediction through text models to protein sequence modeling. We demonstrate competitive results with other examined efficient sparse and dense attention methods, showcasing effectiveness of the novel attention-learning paradigm leveraged by Performers.

artificial intelligence, health & medicine, null, (18 more...)

arXiv.org Machine Learning

2009.14794

Country:

Europe (0.92)
North America > Canada (0.67)
Asia (0.67)
North America > United States > California > Los Angeles County > Long Beach (0.14)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback