The Evolution of Multimodal Model Architectures