Toward Interactive Regional Understanding in Vision-Large Language Models

Open in new window