Goto

Collaborating Authors

 gpt2


88dddaf430b5bc38ab8228902bb61821-Supplemental-Conference.pdf

Neural Information Processing Systems

Supplementary figure 1. Ablanullon study, each row represents the ablated layer and each column the module that is ablated from that layer, for example the first panel shows ablanullon of anullennullon - key in layer 5. Different layers in GPT2 - XL model were ablated and the consequence of ablanullon on curvature measured for 2000 sentences in UD corpus. Red bar shows the layer where ablanullon was applied. AB Supplementary figure 3. A. curvature values for sampled 2000 sentence in RWKV model ( RNN) for both trained an untrained version. B correlanullon between model generated surprisal and curvature in RWKV model. Diamonds: syntacnullc surprisal Supplementary figure 5: E ffect of different decoding strategies in GPT2 - XL sequence generanullon and its comparison to ground - truth(true) same as figure 4b in the main manuscript.




Appendix Table of Contents

Neural Information Processing Systems

The number of layers is 12 for GPT2 and randomly initialized model and 24 for iGPT. Note that these notations are sometimes used interchangeably as long as it doesn't significantly The activation to be analyzed are outputs from all layers . CKA about is shown in Figure 1. The design of the diagram is based on a previous study [35]. Figure 11: Activation we consider to compute CKA.



88dddaf430b5bc38ab8228902bb61821-Supplemental-Conference.pdf

Neural Information Processing Systems

Supplementary figure 1. Ablanullon study, each row represents the ablated layer and each column the module that is ablated from that layer, for example the first panel shows ablanullon of anullennullon - key in layer 5. Different layers in GPT2 - XL model were ablated and the consequence of ablanullon on curvature measured for 2000 sentences in UD corpus. Red bar shows the layer where ablanullon was applied. AB Supplementary figure 3. A. curvature values for sampled 2000 sentence in RWKV model ( RNN) for both trained an untrained version. B correlanullon between model generated surprisal and curvature in RWKV model. Diamonds: syntacnullc surprisal Supplementary figure 5: E ffect of different decoding strategies in GPT2 - XL sequence generanullon and its comparison to ground - truth(true) same as figure 4b in the main manuscript.





Appendix Table of Contents

Neural Information Processing Systems

The number of layers is 12 for GPT2 and randomly initialized model and 24 for iGPT. Note that these notations are sometimes used interchangeably as long as it doesn't significantly The activation to be analyzed are outputs from all layers . CKA about is shown in Figure 1. The design of the diagram is based on a previous study [35]. Figure 11: Activation we consider to compute CKA.