Improving Neuron-level Interpretability with White-box Language Models