Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models