Supplementary Material for Kernel Identification Through Transformers ABackground: Self-Attention

Neural Information Processing Systems 

Since the attention mechanism is rarely used within the GP literature, we provide a brief review of the topic in this section. Below we follow the description of attention as given by Vaswani et al. [8], including extensions to self-attention and multi-head self-attention. The dot-product attention mechanism [8] takes as input a set of queries, keys and values. The queries and keys have dimension Dz and the values have dimension Dv which may differ from Dz. The operation of dot-product attention then generates weights from the queries and keys which are used to produce a linear mapping of the input values.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found