Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs