Test-Time Matching: Unlocking Compositional Reasoning in Multimodal Models