Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning Interpretability

Open in new window