Tackling Polysemanticity with Neuron Embeddings

Nov-12-2024–arXiv.org Artificial Intelligence

We present neuron embeddings, a representation that can be used to tackle polysemanticity by One common method for interpreting the behaviour of a neuron identifying the distinct semantic behaviours in a in a language model is to collect and study the dataset examples neuron's characteristic dataset examples, making which cause the highest neuron activation. Patterns downstream manual or automatic interpretation in a neuron's dataset examples provide an indication of what much easier. We apply our method to GPT2-small, the neuron responds to. However, polysemanticity makes and provide a UI for exploring the results. Neuron these dataset examples much harder to interpret, as there embeddings are computed using a model's internal are often many separate behaviours to understand, some representations and weights, making them of which may be related and others entirely distinct. This domain and architecture agnostic and removing becomes increasingly challenging as you collect examples the risk of introducing external structure which further down the activation spectrum, which is important may not reflect a model's actual computation. We for gaining a complete understanding of a neuron, but often describe how neuron embeddings can be used to reveals a wider range of behaviours (Bolukbasi et al., 2021).

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Nov-12-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Austria > Vienna (0.14)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning
    - Neural Networks (1.00)
    - Statistical Learning > Clustering (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found