Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation