Think Visually, Reason Textually: Vision-Language Synergy in ARC