Coneheads: Hierarchy Aware Attention

Neural Information Processing Systems 

These networks rely heavily on the dot product attention operator, which computes the similarity between two points by taking their inner product. However, the inner product does not explicitly model the complex structural properties of real world datasets, such as hierarchies between data points.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found