When Is Compositional Reasoning Learnable from Verifiable Rewards?