ML Collective's ICML Paper: A Probabilistic Interpretation of Transformers

May-20-2022, 07:07:56 GMT–#artificialintelligence

Since their introduction in 2017, transformers have become the go-to machine learning architecture for natural language processing (NLP) and computer vision. Although they have achieved state-of-the-art performance in these fields, the theoretical framework underlying transformers remains relatively underexplored. In the new paper A Probabilistic Interpretation of Transformers, ML Collective researcher Alexander Shim provides a probabilistic explanation of transformers' exponential dot product attention and contrastive learning based on distributions of the exponential family. An oft-proposed explanation for transformers' power and performance is their attention mechanisms' superior ability to model dependencies in long input sequences. But this doesn't directly address how and why transformer architecture choices such as exponential dot product attention outperform the alternatives.

ml collective, probabilistic interpretation, transformer, (6 more...)

#artificialintelligence

May-20-2022, 07:07:56 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.37)
  - Machine Learning > Neural Networks
    - Deep Learning (0.37)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found