Goto

Collaborating Authors

 bert



88dddaf430b5bc38ab8228902bb61821-Supplemental-Conference.pdf

Neural Information Processing Systems

Supplementary figure 1. Ablanullon study, each row represents the ablated layer and each column the module that is ablated from that layer, for example the first panel shows ablanullon of anullennullon - key in layer 5. Different layers in GPT2 - XL model were ablated and the consequence of ablanullon on curvature measured for 2000 sentences in UD corpus. Red bar shows the layer where ablanullon was applied. AB Supplementary figure 3. A. curvature values for sampled 2000 sentence in RWKV model ( RNN) for both trained an untrained version. B correlanullon between model generated surprisal and curvature in RWKV model. Diamonds: syntacnullc surprisal Supplementary figure 5: E ffect of different decoding strategies in GPT2 - XL sequence generanullon and its comparison to ground - truth(true) same as figure 4b in the main manuscript.



dc6a7e655d7e5840e66733e9ee67cc69-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all the reviewers for helpful suggestions. We will incorporate the following analysis into our revision. Firstly, we found 4 typical patterns shared by both, as shown in Figure 1. Attention patterns shared by XLNet and BERT . Rows and columns represent query and key respectively.




Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

Mariya Toneva, Leila Wehbe

Neural Information Processing Systems

Weusebrainimagingrecordings ofsubjectsreading complex natural text to interpret word and sequence embeddings from4 recent NLP models - ELMo, USE, BERT and Transformer-XL. We study how their representations differ across layer depth, contextlength, and attention type.




3e9f0fc9b2f89e043bc6233994dfcf76-AuthorFeedback.pdf

Neural Information Processing Systems

Weappreciate this point and will revisit the word choice. What is given to the turkers? We will provide the full prompt in revision along with other details (we used 327 annotators) and discussion. For overall trustworthiness for instance, we asked "Does the article read like it comes28 from a trustworthy source?" Nevertheless, BERT is worse at neural fake news discrimination compared with Grover.