How Does Distribution Matching Help Domain Generalization: An Information-theoretic Analysis