Supplementary Material for Kernel Identification Through Transformers A Background: Self-Attention

Neural Information Processing Systems 

Since the attention mechanism is rarely used within the GP literature, we provide a brief review of the topic in this section. Below we follow the description of attention as given by V aswani et al. [8], including extensions to self-attention and multi-head self-attention. The dot-product attention mechanism [8] takes as input a set of queries, keys and values.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found