Tackling Polysemanticity with Neuron Embeddings

Foote, Alex

arXiv.org Artificial Intelligence 

We present neuron embeddings, a representation that can be used to tackle polysemanticity by One common method for interpreting the behaviour of a neuron identifying the distinct semantic behaviours in a in a language model is to collect and study the dataset examples neuron's characteristic dataset examples, making which cause the highest neuron activation. Patterns downstream manual or automatic interpretation in a neuron's dataset examples provide an indication of what much easier. We apply our method to GPT2-small, the neuron responds to. However, polysemanticity makes and provide a UI for exploring the results. Neuron these dataset examples much harder to interpret, as there embeddings are computed using a model's internal are often many separate behaviours to understand, some representations and weights, making them of which may be related and others entirely distinct. This domain and architecture agnostic and removing becomes increasingly challenging as you collect examples the risk of introducing external structure which further down the activation spectrum, which is important may not reflect a model's actual computation. We for gaining a complete understanding of a neuron, but often describe how neuron embeddings can be used to reveals a wider range of behaviours (Bolukbasi et al., 2021).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found