Beyond Artificial Misalignment: Detecting and Grounding Semantic-Coordinated Multimodal Manipulations