Attention that does not Explain Away

Ding, Nan, Fan, Xinjie, Lan, Zhenzhong, Schuurmans, Dale, Soricut, Radu

Sep-29-2020–arXiv.org Machine Learning

This performance in a variety of machine learning is because for a GMM, not all Gaussian centers tasks, such as machine translation (Vaswani et al., (lower layer neurons) are required to contribute in 2017; Dehghani et al., 2019), language modeling generating output data (upper layer neurons). The (Devlin et al., 2019; Yang et al., 2019), summarization information of the centers that do not generate data (Cohan et al., 2018; Goodman et al., 2019), is lost after observing the data. This "explainingaway" dialog (Mazaré et al., 2018; Cheng et al., 2019), effect is related to the one in the directed image captioning (Sharma et al., 2018; Zhao et al., graphical model, in the sense that the existence of 2019), and visual question answering (Yu et al., the few contributed lower neurons "explain away" 2019b; Tan and Bansal, 2019). One of the most important the other muted lower neurons on generating upper components of the Transformer architecture neurons. is its self-attention mechanism, applied universally In order to compensate for this, we describe to both the encoder and the decoder components.

artificial intelligence, natural language, neuron, (21 more...)

arXiv.org Machine Learning

Sep-29-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > Minnesota (0.28)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.66)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found