The Narrow Gate: Localized Image-Text Communication in Vision-Language Models