CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference