CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation