Towards Robust Multimodal Representation: A Unified Approach with Adaptive Experts and Alignment