Evaluating Interventional Reasoning Capabilities of Large Language Models