CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs