Evolution of Concepts in Language Model Pre-Training
Ge, Xuyang, Shu, Wentao, Wu, Jiaxing, Zhou, Yunhua, He, Zhengfu, Qiu, Xipeng
–arXiv.org Artificial Intelligence
Language models obtain extensive capabilities through pre-training. However, the pre-training process remains a black box. In this work, we track linear interpretable feature evolution across pre-training snapshots using a sparse dictionary learning method called crosscoders. We find that most features begin to form around a specific point, while more complex patterns emerge in later training stages. Feature attribution analyses reveal causal connections between feature evolution and downstream performance. Our feature-level observations are highly consistent with previous findings on Transformer's two-stage learning process, which we term a statistical learning phase and a feature learning phase. Our work opens up the possibility to track fine-grained representation progress during language model learning dynamics.
arXiv.org Artificial Intelligence
Sep-23-2025
- Country:
- Africa
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Rwanda > Kigali
- Kigali (0.04)
- Ethiopia > Addis Ababa
- Asia
- China > Shanghai
- Shanghai (0.04)
- Middle East
- Israel > Jerusalem District
- Jerusalem (0.04)
- Saudi Arabia > Asir Province
- Abha (0.04)
- Israel > Jerusalem District
- Singapore (0.04)
- China > Shanghai
- Europe
- Austria > Vienna (0.14)
- France (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America
- Canada > British Columbia
- Vancouver (0.04)
- United States
- Colorado > Denver County
- Denver (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Colorado > Denver County
- Canada > British Columbia
- Oceania > Australia
- New South Wales > Sydney (0.04)
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- Africa
- Genre:
- Research Report > New Finding (0.46)
- Technology: