Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning