Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

He, Zhengfu, Shu, Wentao, Ge, Xuyang, Chen, Lingjie, Wang, Junxuan, Zhou, Yunhua, Liu, Frances, Guo, Qipeng, Huang, Xuanjing, Wu, Zuxuan, Jiang, Yu-Gang, Qiu, Xipeng

Oct-27-2024–arXiv.org Artificial Intelligence

One of the major challenges in training SAEs is the substantial storage and throughput required for latent activations. While text data requires only 2 bytes per token, latent activations occupy 8K bytes per token--resulting in a 4,096x increase in both storage needs and disk throughput. This, combined with the relatively fast training steps of shallow SAEs, means that data loading quickly becomes the main bottleneck in the training process. Due to these infrastructure constraints, we do not save activations in advance but instead generate them on-the-fly. This contrasts with the approach taken by Lieberum et al. (2024); Templeton et al. (2024b), where activations are pre-saved and a high-speed dataloading pipeline is built to keep up with training. To manage this, we adopt a producer-consumer model. Language Models (LMs) generate activations and store them in an activation buffer, while the SAEs consume the activations in random order. The process is serialized: once the buffer is full, SAE training begins, and when half the buffer is consumed, the LMs refill it. Each time the buffer is refilled, we shuffle it to introduce randomness into the training data without needing to save and shuffle all activations at once.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-27-2024

arXiv.org PDF

Add feedback

Country:
- South America > Colombia
  - Meta Department > Villavicencio (0.04)
- North America > United States
  - Nevada > Clark County
    - Las Vegas (0.04)
  - California > Santa Clara County
    - San Jose (0.04)
  - Arizona > Maricopa County
    - Scottsdale (0.04)
- Europe
  - Austria > Vienna (0.14)
  - United Kingdom (0.04)
  - France (0.04)
  - Middle East > Malta
    - Port Region > Southern Harbour District > Valletta (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - Vietnam (0.04)
  - Middle East > Jordan (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - China > Shanghai
    - Shanghai (0.04)

Genre:
- Research Report (0.82)

Industry:
- Government > Military (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.85)