Attention that does not Explain Away

Ding, Nan, Fan, Xinjie, Lan, Zhenzhong, Schuurmans, Dale, Soricut, Radu

arXiv.org Machine Learning 

This performance in a variety of machine learning is because for a GMM, not all Gaussian centers tasks, such as machine translation (Vaswani et al., (lower layer neurons) are required to contribute in 2017; Dehghani et al., 2019), language modeling generating output data (upper layer neurons). The (Devlin et al., 2019; Yang et al., 2019), summarization information of the centers that do not generate data (Cohan et al., 2018; Goodman et al., 2019), is lost after observing the data. This "explainingaway" dialog (Mazaré et al., 2018; Cheng et al., 2019), effect is related to the one in the directed image captioning (Sharma et al., 2018; Zhao et al., graphical model, in the sense that the existence of 2019), and visual question answering (Yu et al., the few contributed lower neurons "explain away" 2019b; Tan and Bansal, 2019). One of the most important the other muted lower neurons on generating upper components of the Transformer architecture neurons. is its self-attention mechanism, applied universally In order to compensate for this, we describe to both the encoder and the decoder components.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found