Towards a Unified Multimodal Reasoning Framework