ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs

Open in new window