Explicit Correlation Learning for Generalizable Cross-Modal Deepfake Detection