Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

He, Zhengfu, Shu, Wentao, Ge, Xuyang, Chen, Lingjie, Wang, Junxuan, Zhou, Yunhua, Liu, Frances, Guo, Qipeng, Huang, Xuanjing, Wu, Zuxuan, Jiang, Yu-Gang, Qiu, Xipeng

arXiv.org Artificial Intelligence 

One of the major challenges in training SAEs is the substantial storage and throughput required for latent activations. While text data requires only 2 bytes per token, latent activations occupy 8K bytes per token--resulting in a 4,096x increase in both storage needs and disk throughput. This, combined with the relatively fast training steps of shallow SAEs, means that data loading quickly becomes the main bottleneck in the training process. Due to these infrastructure constraints, we do not save activations in advance but instead generate them on-the-fly. This contrasts with the approach taken by Lieberum et al. (2024); Templeton et al. (2024b), where activations are pre-saved and a high-speed dataloading pipeline is built to keep up with training. To manage this, we adopt a producer-consumer model. Language Models (LMs) generate activations and store them in an activation buffer, while the SAEs consume the activations in random order. The process is serialized: once the buffer is full, SAE training begins, and when half the buffer is consumed, the LMs refill it. Each time the buffer is refilled, we shuffle it to introduce randomness into the training data without needing to save and shuffle all activations at once.