AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding