Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring