Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning