Be My Eyes: Extending Large Language Models to New Modalities Through Multi-Agent Collaboration