CovMatch: Cross-Covariance Guided Multimodal Dataset Distillation with Trainable Text Encoder