Goto

Collaborating Authors

 different checkpoint


Universal Neurons in GPT-2: Emergence, Persistence, and Functional Impact

Nandan, Advey, Chou, Cheng-Ting, Kurakula, Amrit, Blondin, Cole, Zhu, Kevin, Sharma, Vasu, O'Brien, Sean

arXiv.org Artificial Intelligence

We investigate the phenomenon of neuron universality in independently trained GPT-2 Small models, examining these universal neurons-neurons with consistently correlated activations across models-emerge and evolve throughout training. By analyzing five GPT-2 models at five checkpoints, we identify universal neurons through pairwise correlation analysis of activations over a dataset of 5 million tokens. Ablation experiments reveal significant functional impacts of universal neurons on model predictions, measured via cross entropy loss. Additionally, we quantify neuron persistence, demonstrating high stability of universal neurons across training checkpoints, particularly in early and deeper layers. These findings suggest stable and universal representational structures emerge during language model training.


Appendix Gigastep - One Billion Steps per Second Multi-agent Reinforcement Learning

Neural Information Processing Systems

In competitive or adversarial MARL, an objective reward measure is not defined, as the collected reward inherently depends on the relative strength of the opposing agent's policy. For instance, for the identical 5 vs 5 scenario, in Figure 1a, we plot the win rate of a policy at different checkpoints (x-axis) compared to the same policy at every other checkpoint of the training process (y-axis). For instance, in Figure 7, we show the win rates from the perspective of the team A policy. In this section, we describe the behaviors that emerged after the training process of some of Gigastep's This analysis serves as a demonstration that the task considered in Gigastep allows intelligent and collaborative behavior to emerge through MARL algorithms. We study the identical 20 vs 20 and the special 5 vs 1 scenarios here. A diverse set of behaviors was discovered using the baseline training method.


A training

Neural Information Processing Systems

Table 4 describes the hyperparameters for pre-training the baseline and PLD. Eqn. 5 indicates that the gradient Figure 1 shows the full comparison of the baseline and PLD, fine-tuned at different checkpoints. Specifically, the fine-tuning results are often much worse with a large learning rate. Figure 11: The fine-tuning results at different checkpoints.Figure 12: Convergence curves varying the keep ratio θ .


Analyze the Neurons, not the Embeddings: Understanding When and Where LLM Representations Align with Humans

Fedzechkina, Masha, Gualdoni, Eleonora, Williamson, Sinead, Metcalf, Katherine, Seto, Skyler, Theobald, Barry-John

arXiv.org Artificial Intelligence

Modern large language models (LLMs) achieve impressive performance on some tasks, while exhibiting distinctly non-human-like behaviors on others. This raises the question of how well the LLM's learned representations align with human representations. In this work, we introduce a novel approach to the study of representation alignment: we adopt a method from research on activation steering to identify neurons responsible for specific concepts (e.g., 'cat') and then analyze the corresponding activation patterns. Our findings reveal that LLM representations closely align with human representations inferred from behavioral data. Notably, this alignment surpasses that of word embeddings, which have been center stage in prior work on human and model alignment. Additionally, our approach enables a more granular view of how LLMs represent concepts. Specifically, we show that LLMs organize concepts in a way that reflects hierarchical relationships interpretable to humans (e.g., 'animal'-'dog').


Measuring Interventional Robustness in Reinforcement Learning

Avery, Katherine, Kenney, Jack, Amaranath, Pracheta, Cai, Erica, Jensen, David

arXiv.org Artificial Intelligence

Recent work in reinforcement learning has focused on several characteristics of learned policies that go beyond maximizing reward. These properties include fairness, explainability, generalization, and robustness. In this paper, we define interventional robustness (IR), a measure of how much variability is introduced into learned policies by incidental aspects of the training procedure, such as the order of training data or the particular exploratory actions taken by agents. A training procedure has high IR when the agents it produces take very similar actions under intervention, despite variation in these incidental aspects of the training procedure. We develop an intuitive, quantitative measure of IR and calculate it for eight algorithms in three Atari environments across dozens of interventions and states. From these experiments, we find that IR varies with the amount of training and type of algorithm and that high performance does not imply high IR, as one might expect.


[D] Machine Learning - WAYR (What Are You Reading) - Week 93

#artificialintelligence

Deep Ensembles: A Loss Landscape Perspective: This paper takes a dig into why the ensemble of deep networks works better than a single deep network. The authors did a qualitative investigation that actually demystifies some of the inner workings of deep neural nets. These are some of the observation: Same model trained with different initial initializations is functionally dissimilar. Neural networks map inputs to outputs and thus act as a function(which we learn obviously). If we start with init1 we end up with function1 which is not similar to the same model trained with init2. However, if we take a snapshot of the model at different epochs they are functionally similar.