Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy

Open in new window