Interpretability at Scale: Identifying Causal Mechanisms in Alpaca Zhengxuan Wu, Atticus Geiger

Neural Information Processing Systems 

Obtaining human-interpretable explanations of large, general-purpose language models is an urgent goal for AI safety.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found