Harnessing Dataset Cartography for Improved Compositional Generalization in Transformers