Rethinking Visual Information Processing in Multimodal LLMs