Self-Attention Between Datapoints: Going Beyond Individual Input-Output Pairs in Deep Learning

Kossen, Jannik, Band, Neil, Lyle, Clare, Gomez, Aidan N., Rainforth, Tom, Gal, Yarin

Jun-4-2021–arXiv.org Machine Learning

We challenge a common assumption underlying most supervised deep learning: that a model makes a prediction depending only on its parameters and the features of a single input. To this end, we introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time. Our approach uses self-attention to reason about relationships between datapoints explicitly, which can be seen as realizing non-parametric models using parametric attention mechanisms. However, unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction. Empirically, our models solve cross-datapoint lookup and complex reasoning tasks unsolvable by traditional deep learning models. We show highly competitive results on tabular data, early results on CIFAR-10, and give insight into how the model makes use of the interactions between points.

datapoint, dataset, npt, (14 more...)

arXiv.org Machine Learning

Jun-4-2021

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine (0.68)
- Information Technology > Security & Privacy (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (1.00)
  - Statistical Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found