Can Large Language Models Learn Independent Causal Mechanisms?