Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning Changsheng Lv