Transcoders Find Interpretable LLM Feature Circuits

Neural Information Processing Systems 

A key goal in mechanistic interpretability is circuit analysis: finding sparse sub-graphs of models corresponding to specific behaviors or capabilities.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found