A Survey on Image-text Multimodal Models