Toward Interactive Regional Understanding in Vision-Large Language Models