Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference

Open in new window