Learning Audio Concepts from Counterfactual Natural Language