LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description

Open in new window