Locating Factual Knowledge in Large Language Models: Exploring the Residual Stream and Analyzing Subvalues in Vocabulary Space
–arXiv.org Artificial Intelligence
We find the location of factual knowledge in large language models by exploring the residual stream and analyzing subvalues in vocabulary space. We find the reason why subvalues have human-interpretable concepts when projecting into vocabulary space. The before-softmax values of subvalues are added by an addition function, thus the probability of top tokens in vocabulary space will increase. Based on this, we find using log probability increase to compute the significance of layers and subvalues is better than probability increase, since the curve of log probability increase has a linear monotonically increasing shape. Moreover, we calculate the inner products to evaluate how much a feed-forward network (FFN) subvalue is activated by previous layers. Base on our methods, we find where factual knowledge is stored. Specifically, attention layers store "Paris is related to France". FFN layers store "Paris is a capital/city", activated by attention subvalues related to "capital". We leverage our method on Baevski-18, GPT2 medium, Llama-7B and Llama-13B. Overall, we provide a new method for understanding the mechanism of transformers. We will release our code on github.
arXiv.org Artificial Intelligence
Jan-30-2024
- Country:
- Asia
- China > Beijing
- Beijing (0.04)
- India > West Bengal
- Kolkata (0.14)
- Indonesia > Java
- Japan
- Honshū
- Chūgoku > Hiroshima Prefecture
- Hiroshima (0.04)
- Kantō > Tokyo Metropolis Prefecture
- Tokyo (0.04)
- Chūgoku > Hiroshima Prefecture
- Kyūshū & Okinawa > Kyūshū
- Nagasaki Prefecture > Nagasaki (0.04)
- Honshū
- Middle East
- Iraq > Baghdad Governorate
- Baghdad (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Iraq > Baghdad Governorate
- Philippines > Luzon
- National Capital Region > City of Manila (0.04)
- Russia (0.04)
- South Korea > Seoul
- Seoul (0.04)
- China > Beijing
- Europe
- Czechia > Prague (0.04)
- Ireland (0.04)
- United Kingdom (0.04)
- Middle East > Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Finland > Uusimaa
- Helsinki (0.04)
- Bosnia and Herzegovina > Federation of Bosnia and Herzegovina
- Sarajevo Canton > Sarajevo (0.04)
- Belgium (0.04)
- Netherlands > Gelderland
- Arnhem (0.04)
- France
- Grand Est > Bas-Rhin
- Strasbourg (0.04)
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Marseille (0.04)
- Île-de-France > Paris
- Paris (0.04)
- Grand Est > Bas-Rhin
- Portugal > Lisbon
- Lisbon (0.04)
- Norway (0.04)
- Monaco (0.04)
- Denmark (0.04)
- Germany
- Bavaria > Upper Bavaria
- Munich (0.04)
- Berlin (0.14)
- Bavaria > Upper Bavaria
- Hungary > Budapest
- Budapest (0.04)
- Italy > Piedmont
- Turin Province > Turin (0.04)
- Spain > Galicia
- Madrid (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- North America
- Canada
- United States > Illinois
- Cook County > Chicago (0.04)
- Oceania
- South America
- Asia
- Genre:
- Research Report (0.64)
- Technology: