Captured by Captions: On Memorization and its Mitigation in CLIP Models