Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models

Open in new window