Goto

Collaborating Authors

 gpt2



88dddaf430b5bc38ab8228902bb61821-Supplemental-Conference.pdf

Neural Information Processing Systems

Supplementary figure 1. Ablanullon study, each row represents the ablated layer and each column the module that is ablated from that layer, for example the first panel shows ablanullon of anullennullon - key in layer 5. Different layers in GPT2 - XL model were ablated and the consequence of ablanullon on curvature measured for 2000 sentences in UD corpus. Red bar shows the layer where ablanullon was applied. AB Supplementary figure 3. A. curvature values for sampled 2000 sentence in RWKV model ( RNN) for both trained an untrained version. B correlanullon between model generated surprisal and curvature in RWKV model. Diamonds: syntacnullc surprisal Supplementary figure 5: E ffect of different decoding strategies in GPT2 - XL sequence generanullon and its comparison to ground - truth(true) same as figure 4b in the main manuscript.




Appendix A Distribution of Class Labels Across Each Probing Task

Neural Information Processing Systems

We also implemented the Iterative Null-Space Projection (INLP) method (Ravfogel et al., 2020) to Results using our method are in Table 4. Results using the INLP method are This pattern holds across all of the linguistic properties that we tested. Each language brain region is not necessarily homogeneous in function across all voxels it contains. Bottom plot displays the pretrained BERT vs. removal of all tasks. Like the probing experiments with BERT in the main paper, we also perform experiments with GPT2. We find the results to be similar to BERT, i.e., a rich hierarchy of linguistic signals: initial to middle layers encode surface information, middle layers encode syntax, middle to top layers We verify that the removal of each linguistic property from GPT2 leads to reduced task performance across all layers, as expected.



88dddaf430b5bc38ab8228902bb61821-Supplemental-Conference.pdf

Neural Information Processing Systems

Supplementary figure 1. Ablanullon study, each row represents the ablated layer and each column the module that is ablated from that layer, for example the first panel shows ablanullon of anullennullon - key in layer 5. Different layers in GPT2 - XL model were ablated and the consequence of ablanullon on curvature measured for 2000 sentences in UD corpus. Red bar shows the layer where ablanullon was applied. AB Supplementary figure 3. A. curvature values for sampled 2000 sentence in RWKV model ( RNN) for both trained an untrained version. B correlanullon between model generated surprisal and curvature in RWKV model. Diamonds: syntacnullc surprisal Supplementary figure 5: E ffect of different decoding strategies in GPT2 - XL sequence generanullon and its comparison to ground - truth(true) same as figure 4b in the main manuscript.